Detailed Statistical Supplement

Simulated Patient-Level Data and MAIC Methodology

1 Document Overview

1.1 Introduction

This supplemental document provides a transparent and reproducible framework for the data generation and methods used in our matching-adjusted indirect comparison (MAIC) analysis.

Specifically, this document:

  1. Provides summary-level baseline and safety data from published studies (IMspire150, COLUMBUS, COMBI-D/COMBI-v, and coBRIM).
  2. Demonstrates the process of extracting survival data from published Kaplan-Meier curves using digitization and reconstruction methods.
  3. Presents a worked example of the MAIC process, including:
    • Simulation of individual patient-level data (IPD) for the NIVO+RELA cohort
    • Matching methodology
    • Example analysis of binary (overall response rate) and time-to-event (overall survival) outcomes

To maintain confidentiality, IPD from Relativity 047 are simulated for illustrative purposes only. These data do not represent real patients or actual study results.

For simplicity, the worked example focuses on overall response rate (ORR) and overall survival (OS) but the same methodology is applicable to other outcomes. All analyses are conducted in R.

1.2 MAIC Process Schematic

Show R Code
library(ggplot2)
library(gridExtra)
library(grid)
library(gt)
library(dplyr)
library(survival)
library(survminer)

######################################################################################################
## Create Tables and Plots
######################################################################################################
# === Dummy KM Plot ===
combi_km_data <- tibble(time = c(0, 5, 10, 15, 20, 25, 30),
                      surv = c(1, 0.9, 0.75, 0.6, 0.4, 0.2, 0.1))
combi_km_plot <- ggplot(combi_km_data, aes(x = time, y = surv)) +
  geom_step(size = 1.2, color = "steelblue") +
  theme_minimal() +
  labs(title = "Kaplan-Meier Curve") +
  theme(axis.title = element_blank(), axis.text = element_blank(), axis.ticks = element_blank(), 
        plot.title = element_text(hjust = 0.5, face = "bold", size = 12))
combi_km_grob <- ggplotGrob(combi_km_plot)

# Dummy RELA KM Curve
rela_km_data <- tibble(time = c(0, 5, 10, 15, 20), surv = c(1, 0.85, 0.7, 0.5, 0.3))
rela_km_plot <- ggplot(rela_km_data, aes(x = time, y = surv)) +
  geom_step(size = 1.2, color = "firebrick") +
  theme_minimal() +
  labs(title = "Kaplan-Meier Curve") +
  theme(axis.title = element_blank(), axis.text = element_blank(), axis.ticks = element_blank(), 
        plot.title = element_text(hjust = 0.5, face = "bold", size = 12))
rela_km_grob <- ggplotGrob(rela_km_plot)

# Overlayed DM
overlay_km_plot <- ggplot() +
  geom_step(data = combi_km_data, aes(x = time, y = surv), color = "steelblue", size = 1.2) +
  geom_step(data = rela_km_data, aes(x = time, y = surv), color = "firebrick", size = 1.2) +
  theme_minimal() +
  labs(title = "MAIC Applied KM Curves") +
  theme(axis.title = element_blank(), axis.text = element_blank(), axis.ticks = element_blank(), 
        plot.title = element_text(hjust = 0.5, face = "bold", size = 12))
overlay_km_grob <- ggplotGrob(overlay_km_plot)

# === Create Fake Survival Tables ===
pseudo_ipd_surv_df <- tibble(
  time = c(0.386, 0.802, 1.03, 1.302, 1.522),
  event = c(0, 1, 0, 1, 0)
)

rela_ipd_surv_df <- tibble(
  time = c(0.42, 0.88, 1.12, 1.45, 1.9),
  event = c(0, 1, 1, 0, 1)
)

pseudo_ipd_surv_grob <- tableGrob(
  pseudo_ipd_surv_df, 
  rows = NULL, 
  theme = ttheme_default(
    base_size = 6.8,
    padding = unit(c(0.5, 0.5), "mm")
  )
)

rela_ipd_surv_grob <- tableGrob(
  rela_ipd_surv_df, 
  rows = NULL, 
  theme = ttheme_default(
    base_size = 6.8,
    padding = unit(c(0.5, 0.5), "mm")
  )
)

# === Dataframes ===
rela_df <- tibble(
  ID = 1:4,
  Female = c(1, 0, 1, 0),
  Age = c(47, 52, 77, 61),
  ECOG = c(0, 0, 1, 0),
  Time = c(5, 10, 15, 20),
  Event = c(1, 0, 1, 1),
  trt = rep("NIVO+RELA", 4)
)
rela_df_weighted <- rela_df |> mutate(Weight = c(1.2, 0.8, 1.0, 1.1))

# Summary Level Data from Combi Trials
combi_df<- tibble(
  Female = 0.433,
  `Age <55` = 0.500,
  `ECOG 0` = 0.715,
  trt = rep("DAB+TRAM")
)

# === Convert tables and shrink them using theme ==
## Relativity 047
rela_grob <- tableGrob(
  rela_df, 
  rows = NULL, 
  theme = ttheme_default(
    base_size = 6.8,  # Keeps text readable
    padding = unit(c(1, 1), "mm")  # Reduce vertical and horizontal padding
  )
)
rela_weighted_grob <- tableGrob(
  rela_df_weighted, 
  rows = NULL, 
  theme = ttheme_default(
    base_size = 6.8,
    padding = unit(c(1, 1), "mm")
    )
  )
## Combi
combi_grob <- tableGrob(
  combi_df, 
  rows = NULL, 
  theme = ttheme_default(
    base_size = 6.8,
    padding = unit(c(1, 1), "mm")
    )
  )



######################################################################################################
## Create Visualization Layers
######################################################################################################
# === Base plot ===
schema_plot <- ggplot() + 
  xlim(0, 20) + ylim(0, 10) + 
  theme_void()

# === Data Acquisition & Processing background (left + middle columns) ===
schema_plot <- schema_plot +
  geom_rect(aes(
    xmin = 0.0, xmax = 10.0, 
    ymin = 0.8, ymax = 9.2), 
          fill = "#B3CDE3", alpha = 0.2, color = "black") +
  

  # Matched Analysis & Outcomes background (right column)
  geom_rect(aes(
    xmin = 10.0, xmax = 20, 
    ymin = 0.8, ymax = 9.2), 
            fill = "#F4A582", alpha = 0.2, color = "black") 


# === Boxes === (Set width uniformly: xmin = x, xmax = x + 4.75)
schema_plot <- schema_plot +
  # Published Trials
  geom_rect(aes(
    xmin = 0.125, xmax = 4.875, 
    ymin = 5.1, ymax = 9), fill = "#BBDEFB", color = "black") +
  # RELATIVITY-047
  geom_rect(aes(
    xmin = 0.125, xmax = 4.875, 
    ymin = 1, ymax=4.9), fill = "#FFCDD2", color = "black") 

# Pseudo and Observed IPD Column (second column)
schema_plot <- schema_plot +
  geom_rect(aes(
    xmin = 5.0, xmax = 9.875, 
    ymin = 5.1, ymax = 9), fill = "#D7F7E4", color = "black") +

  geom_rect(aes(
    xmin = 5.0, xmax = 9.875, 
    ymin = 1, ymax=4.9), fill = "#D7F7E4", color = "black") 

# Weighted Column (third column)
schema_plot <- schema_plot +
  geom_rect(aes(
    xmin = 10.125, xmax = 14.875, 
    ymin = 5.1, ymax = 9), fill = "#FFF0B3", color = "black") +

  geom_rect(aes(
    xmin = 10.125, xmax = 14.875, 
    ymin = 1, ymax=4.9), fill = "#FFF0B3", color = "black") 

# Comparison Column (fourth column)
schema_plot <- schema_plot +
  geom_rect(aes(
    xmin = 15, xmax = 19.875, 
    ymin = 1, ymax = 9), fill = "#FFB74D", color = "black") 

# Box behind WebPlotDigitzer
schema_plot <- schema_plot +
  geom_rect(aes(
    xmin = 4.0, xmax = 6.0, 
    ymin = 5.9, ymax = 6.5), fill = "#FFD580", color = "black") 

# Box behind Use Observed IPD
schema_plot <- schema_plot +
  geom_rect(aes(
    xmin = 4.0, xmax = 6.0, 
    ymin = 2.18, ymax = 2.5), fill = "#FFD580", color = "black") 
# Box behind Matching Process
schema_plot <- schema_plot +
  geom_rect(aes(
    xmin = 8.9, xmax = 10, 
    ymin = 4.38, ymax = 5.5), fill = "#FFD580", color = "black") 

# === Kaplan Meier Curves ===
schema_plot <-  schema_plot +
  # Combi KM Curve
  annotation_custom(
    combi_km_grob, 
    xmin = 0.75, xmax = 4, 
    ymin = 5.1, ymax = 6.6)  +
  # Embed RELA KM curve under the IPD table
  annotation_custom(
    rela_km_grob, 
    xmin = 0.75, xmax = 4, 
    ymin = 1.1, ymax = 2.6) +
  # Embed RELA KM curve under the IPD table
  annotation_custom(
    overlay_km_grob, 
    xmin = 16, xmax = 19.25, 
    ymin = 4.5, ymax = 6.2) 




# === Embed shrunk GROBs (Summary Level and IPD Data Tables) ===
schema_plot <- schema_plot +
  # combi table
  annotation_custom(combi_grob, 
                    xmin = 1.5, xmax = 3.5, 
                    ymin = 6.8, ymax = 7.8) +
  annotation_custom(combi_grob, 
                    xmin = 7.0, xmax = 8.0, 
                    ymin = 6.8, ymax = 7.8) +
  annotation_custom(combi_grob, 
                    xmin = 11.5, xmax = 13.5, 
                    ymin = 6.8, ymax = 7.8) +
  # relativity 047 tables
  annotation_custom(rela_grob, 
                    xmin = 1.5, xmax = 3.5, 
                    ymin = 2.7, ymax = 4.0) +
  annotation_custom(rela_grob, 
                    xmin = 6.5, xmax = 8.5, 
                    ymin = 2.7, ymax = 4.0) +
  annotation_custom(rela_weighted_grob, 
                    xmin = 12.0, xmax = 13, 
                    ymin = 2.7, ymax = 4.0) 


# === Embed Survival Tables ===
# Add survival tables in the second column, matching the height of KM curves
schema_plot <- schema_plot +
  # Pseudo-IPD survival table under the Published Data green box
  annotation_custom(
    grob = grobTree(
      pseudo_ipd_surv_grob, 
      vp = viewport(
        width = 0.7, height = 0.6)),
    xmin = 7, xmax = 8, 
    ymin = 5.25, ymax = 6.4
  ) +
  
  # Pseudo-IPD survival table under the Published Data Yellow box
  annotation_custom(
    grob = grobTree(
      pseudo_ipd_surv_grob, 
      vp = viewport(
        width = 0.7, height = 0.6)),
    xmin = 12, xmax = 13, 
    ymin = 5.25, ymax = 6.4
  ) +
  # RELATIVITY survival table under the RELATIVITY-047 green box
  annotation_custom(
    grob = grobTree(
      rela_ipd_surv_grob, 
      vp = viewport(
        width = 0.7, height = 0.6)),
    xmin = 7, xmax = 8, 
    ymin = 1.3, ymax = 2.45
  ) +
  # RELATIVITY survival table under the RELATIVITY-047 Yellow box
  annotation_custom(
    grob = grobTree(
      rela_ipd_surv_grob, 
      vp = viewport(
        width = 0.7, height = 0.6)),
    xmin = 12, xmax = 13, 
    ymin = 1.3, ymax = 2.45
  ) 

# === Arrows ===
schema_plot <- schema_plot +
  # Arrow under WebPlotDigitizer from first to second column
  geom_curve(aes(
    x = 4.5, xend = 5.5, 
    y = 5.8, yend = 5.8), 
    arrow = arrow(length = unit(0.3, "cm")), curvature = 0) +

  # Arrow under use observed IPD first to second column
  geom_curve(aes(
    x = 4.5, xend = 5.5, 
    y = 2, yend = 2), 
    arrow = arrow(length = unit(0.3, "cm")), curvature = 0) +

  # First row arrow between second and third column
  geom_curve(aes(
    x = 9.5, xend = 10.5, 
    y = 5.8, yend = 5.8), 
    arrow = arrow(length = unit(0.3, "cm")), curvature = 0) +
  # Second row arrow between second and third column
  geom_curve(aes(
    x = 9.5, xend = 10.5, 
    y = 2, yend = 2), 
    arrow = arrow(length = unit(0.3, "cm")), curvature = 0) +
  
  # Curve from Combi Summarized Table Weighted IPD
  geom_curve(
    aes(
      x = 9.4, xend = 10.1, 
      y = 7.1, yend = 4.0), 
    arrow = arrow(length = unit(0.3, "cm")), curvature = -0.3, color = "darkblue") +
  
  # Curve from Relativity IPD to Weighted IPD
  geom_curve(
    aes(
      x = 9.4, xend = 10.1, 
      y = 4.0, yend = 4.0), 
    arrow = arrow(length = unit(0.3, "cm")), curvature = -0.3, color = "darkblue") +
  
  # Curve from combi summarized table to MAIC KM 
  geom_curve(
    aes(
      x = 14.99, xend = 16.0, 
      y = 7.1, yend = 5.25), 
    arrow = arrow(length = unit(0.3, "cm")), curvature = 0, color = "darkblue") +
  
  # Curve from combi survival table to MAIC KM 
  geom_curve(
    aes(
      x = 14.99, xend = 16.0, 
      y = 6, yend = 5.25), 
    arrow = arrow(length = unit(0.3, "cm")), curvature = 0, color = "darkblue") +
  # Curve from RELATIVITY-047 summarized table to MAIC KM 
  geom_curve(
    aes(
      x = 14.99, xend = 16.0, 
      y = 3.5, yend = 5.25), 
    arrow = arrow(length = unit(0.3, "cm")), curvature = 0, color = "darkblue") +
  
  # Curve from RELATIVITY-047 survival table to MAIC KM 
  geom_curve(
    aes(
      x = 14.99, xend = 16.0, 
      y = 2.5, yend = 5.25), 
    arrow = arrow(length = unit(0.3, "cm")), curvature = 0, color = "darkblue") 


  


# === Add Labels ===
schema_plot <- schema_plot +
  # Background Box  - Label Left
  annotate("text", 
           x = 5.75, y = 9.4, label = "Data Acquisition & Processing", 
           size = 4, fontface = "bold") +
  # Background Box - Label Right
  annotate("text", 
           x = 15.5, y = 9.4, label = "Matched Analysis & Outcomes",
           size = 4, fontface = "bold") +
  # Published Trial Boxes Labels (Top row)
  ## First Column
  annotate("text",
           x = 2.5, y = 8.9,
           label = "Published Trials\nSummary-Level Data &\nKaplan-Meier Curves", 
           size = 4, fontface = "bold",
           vjust = 1, 
           hjust = 0.5) +
  ## Second Column
  annotate("text",
           x = 7.5, y = 8.9, 
           label = "Published Trials\nSummary-Level Data &\n Pseudo-IPD Survival Table", 
           size = 4, fontface = "bold",
           vjust = 1, 
           hjust = 0.5) +
  ## Third Column
  annotate("text",
           x = 12.5, y = 8.9, 
           label = "Published Trials\nSummary Level Data &\n Pseudo-IPD Survival Table", 
           size = 4, fontface = "bold",
           vjust = 1, 
           hjust = 0.5) +
  # Fourth Column 
  annotate("text",
           x = 17.5, y = 8.9, 
           label = "MAIC\n Published Trials &\nRELATIVITY-047", 
           size = 4, fontface = "bold",
           vjust = 1, 
           hjust = 0.5) +
  # RELATIVITY-047 Labels
  ## First Column
  annotate("text",
           x = 2.5,y = 4.75, 
           label = "RELATIVITY-047\nIndividual Patient Data", 
           size = 4, fontface = "bold",
           vjust = 1, 
           hjust = 0.5) +
  # Second Column  
  annotate("text",
           x = 7.5,y = 4.75, 
           label = "RELATIVITY-047\nIndividual Patient Data", 
           size = 4, fontface = "bold",
           vjust = 1, 
           hjust = 0.5) +
  # Third Column  
  annotate("text",
           x = 12.5,y = 4.75, 
           label = "RELATIVITY-047\n Weighted IPD", 
           size = 4, fontface = "bold",
           vjust = 1, 
           hjust = 0.5) +  
  
  # Published Trials - Summary Patient Characteristics
  annotate("text", 
           x = 2.5, y = 7.7, 
           label = "Patient Characteristics - Summarized", size = 3.5, fontface = "bold", 
           vjust = 1, 
           hjust = 0.5) +
  annotate("text", 
           x = 7.5, y = 7.7, 
           label = "Patient Characteristics - Summarized", size = 3.5, fontface = "bold", 
           vjust = 1, 
           hjust = 0.5) +
  annotate("text",
           x = 12.5, y = 7.7, 
           label = "Patient Characteristics - Summarized", size = 3.5, fontface = "bold", 
           vjust = 1, 
           hjust = 0.5) +
  
  # Observed IPD - Individual Patient Data
  annotate("text", 
           x = 2.5, y = 4.0, 
           label = "Patient Characteristics - IPD", size = 3.5, fontface = "bold", 
           vjust = 1, 
           hjust = 0.5) +
  # Observed IPD - Individual Patient Data
  annotate("text", 
           x = 7.5, y = 4.0, 
           label = "Patient Characteristics - IPD", size = 3.5, fontface = "bold", 
           vjust = 1, 
           hjust = 0.5) +
  # Observed IPD - Individual Patient Data
  annotate("text", 
           x = 12.5, y = 4.0, 
           label = "Patient Characteristics - Weighted IPD", size = 3.5, fontface = "bold", 
           vjust = 1, 
           hjust = 0.5) +

  # Survival Table Headers (Top survival table)
  annotate("text", 
           x = 7.5, y = 6.5, 
           label = "Survival Table", size = 3.5, fontface = "bold", 
           vjust = 1, 
           hjust = 0.5) +
  annotate("text", 
           x = 12.5, y = 6.5, 
           label = "Survival Table", size = 3.5, fontface = "bold", 
           vjust = 1, 
           hjust = 0.5) +
  
  # Survival Table Headers (Bottom survival table)
  annotate("text", 
           x = 7.5, y = 2.55, 
           label = "Survival Table", size = 3.5, fontface = "bold", 
           vjust = 1, 
           hjust = 0.5) +
  # Survival Table Headers (Bottom survival table)
  annotate("text", 
           x = 12.5, y = 2.55, 
           label = "Survival Table", size = 3.5, fontface = "bold", 
           vjust = 1, 
           hjust = 0.5) +
  # Label for KM to Survival table arrow for combi  
  annotate("text", 
           x = 5.0, y = 6.4, 
           label = "Extract Pseudo-IPD\n(WebPlotDigitizer)", size = 3, 
           vjust = 1, 
           hjust = 0.5) +
  # Label for KM to Survival table arrow for relativity 047
  annotate("text", 
           x = 5.0, y = 2.4, 
           label = "Use Observed IPD", size = 3, 
           vjust = 1, 
           hjust = 0.5) +  
  # Annotate label for matching step
  annotate("text", 
           x = 9.45, y = 5.4, 
           label = "Matching\nProcess\n(Weights\nEstimated)", size = 3, fontface = "italic", color = "darkblue", 
           vjust = 1, 
           hjust = 0.5) 


# === Title ===
schema_plot <- schema_plot +
  annotate("text", 
           x = 10, y = 9.99, 
           label = "Overall Schema of MAIC Process", 
           size = 5.5, fontface = "bold",
           hjust = 0.5,
           vjust = 1)

# === Render ===
print(schema_plot)

1.2.1 Figure Legend: MAIC Process Schematic

This schematic provides a high-level overview of the MAIC framework, using the comparison of Nivolumab + Relatlimab (NIVO+RELA) and Dabrafenib + Trametinib (DAB+TRAM) as an illustrative example. The figure is designed to demonstrate key concepts in the process and does not capture every step involved in the full analysis. All data shown are simulated and intended solely for illustration. They do not represent actual patient-level data or trial results used in the analysis or manuscript.

1.2.1.1 Data Acquisition & Processing (Blue / Green Panels)

  • Published Trials: Summary-level patient characteristics and Kaplan-Meier (KM) curves are abstracted from published sources.
  • Pseudo-IPD Reconstruction: The KM curves are digitized using WebPlotDigitizer to generate pseudo-individual patient data (pseudo-IPD) with survival times and event indicators.
  • RELATIVITY-047 IPD: Individual patient-level data (IPD) from the RELATIVITY-047 trial is used as the observed dataset, including baseline characteristics and survival outcomes.
  • Survival Tables: Both datasets are converted into survival tables to prepare for matching.

MAIC schema

1.2.1.2 Matched Analysis & Outcomes (Yellow / Orange Panels)

  • Matching Process: The RELATIVITY-047 IPD is reweighted to align baseline characteristics with the published comparator data. Estimated weights ensure comparability across covariates such as age, sex, and ECOG performance status.
  • Weighted IPD and Pseudo-IPD: Both the pseudo-IPD and the weighted RELATIVITY-047 data are prepared for outcome comparison.
  • MAIC Comparison: The adjusted overall survival (OS) and objective response rate (ORR) are compared between NIVO+RELA and DAB+TRAM using the reweighted datasets.

1.2.1.3 Visualization Elements

  • Blue KM curves represent the comparator arm (DAB+TRAM) reconstructed from published data.
  • Red KM curves represent the RELATIVITY-047 IPD (NIVO+RELA).
  • Overlaid KM plot shows survival estimates post-MAIC adjustment.
  • Tables summarize patient characteristics and survival data at each step.
  • Arrows and labels denote the flow of data through extraction, weighting, and comparison.

2 Core Libraries for Analysis

2.1 Statistical & Visualization Toolkit

👇 Expand the code chunk below to view the packages necessary for this project

Show R Code
## Clear workspace for reproducibility
rm(list = ls())

# Function to load/install packages (avoids clutter & repetition)
load_pkg <- function(pkg) {
  if (!requireNamespace(pkg, quietly = TRUE)) install.packages(pkg)
  library(pkg, character.only = TRUE)
}

# Load core data manipulation and plotting packages
core_packages <- c(
  "tidyverse",   # dplyr, ggplot2, readr, etc.
  "data.table",  # Fast reading and manipulation
  "MASS",        # Statistical models
  "splines",     # Spline functions for survival curves
  "survival",    # Survival analysis
  "survminer",   # Publication-ready survival plots
  "ggplot2",     # Visualization
  "scales",      # Axis scaling for ggplots
  "grid", "gridExtra",  # Grid layouts for complex plots
  "gt", "gtsummary",    # Beautiful tables
  "knitr", "kableExtra", # Reporting & enhanced tables
  "here",        # Reproducible project paths
  "jpeg",        # Import JPEG images
  "sandwich",    # Robust standard errors
  "survey",      # Complex survey design / weighting
  "htmltools",   # HTML manipulation (for scrollable tables)
  "shiny"        # Optional: Shiny app integration
)
Show R Code
# Load all core packages
lapply(core_packages, load_pkg)

3 Load Data and Simulate Patient-Level Dataset

3.1 Data Overview

In this section, we load summary-level baseline and safety data from four published studies:

  • IMspire150
  • COLUMBUS
  • COMBI-d/COMBI-v
  • coBRIM

These datasets are provided as CSV files in the repository and are organized under the files directory.

We also generate a simulated patient-level dataset (pseudo-IPD) for the NIVO+RELA cohort. This dataset is used for demonstration purposes and does not represent real patient data.

Show R Code
# Establish the base path to the data directory
data_path <- here::here("files")

3.2 Load Summary-Level Data

Show R Code
subD <- "Summary level data"
trial <- "atezo+vemu+cobi"
# Step 1: Load the dataset
imspire150 <- read_csv(file = paste0(data_path, "/", subD, "/", trial, "/SLD_safety.csv"))

# Step 2: Create the GT table with bolded title
imspire150_table <- imspire150 %>%
  gt() %>%
  tab_header(
    title = html("<b>Preview of IMspire150 Summary Level and Safety Data</b>")
  ) %>%
  cols_align(align = "center", columns = everything())

# Step 3: Render inside scrollable HTML div
htmltools::div(
  style = "overflow-x: auto; height: 400px;",
  imspire150_table
)
Preview of IMspire150 Summary Level and Safety Data
characteristics trial arm mean sd type n
sex_female IMspire150 (Ascierto 2022) Atezolizumab + vemurafenib + cobimetinib 0.41406250 NA categorical 256
sex_male IMspire150 (Ascierto 2022) Atezolizumab + vemurafenib + cobimetinib 0.58593750 NA categorical 256
race_white IMspire150 (Ascierto 2022) Atezolizumab + vemurafenib + cobimetinib 0.94921875 NA categorical 256
race_race_other IMspire150 (Ascierto 2022) Atezolizumab + vemurafenib + cobimetinib 0.05078125 NA categorical 256
geo_europe IMspire150 (Ascierto 2022) Atezolizumab + vemurafenib + cobimetinib 0.79296875 NA categorical 256
geo_north_americas IMspire150 (Ascierto 2022) Atezolizumab + vemurafenib + cobimetinib 0.05078125 NA categorical 256
geo_other IMspire150 (Ascierto 2022) Atezolizumab + vemurafenib + cobimetinib 0.15625000 NA categorical 256
ecog_ecog_0 IMspire150 (Ascierto 2022) Atezolizumab + vemurafenib + cobimetinib 0.75781250 NA categorical 256
ecog_ecog_1 IMspire150 (Ascierto 2022) Atezolizumab + vemurafenib + cobimetinib 0.24218750 NA categorical 256
age_above_54 IMspire150 (Ascierto 2022) Atezolizumab + vemurafenib + cobimetinib 0.50000000 NA categorical 256
baseline_met_stage_m1b IMspire150 (Ascierto 2022) Atezolizumab + vemurafenib + cobimetinib 0.22265625 NA categorical 256
baseline_met_stage_m01a IMspire150 (Ascierto 2022) Atezolizumab + vemurafenib + cobimetinib 0.21484375 NA categorical 256
baseline_met_stage_m1c1d IMspire150 (Ascierto 2022) Atezolizumab + vemurafenib + cobimetinib 0.56250000 NA categorical 256
ldh_cat_ldh_great_uln IMspire150 (Ascierto 2022) Atezolizumab + vemurafenib + cobimetinib 0.32812500 NA categorical 256
ldh_cat_ldh_lower_equal_uln IMspire150 (Ascierto 2022) Atezolizumab + vemurafenib + cobimetinib 0.67187500 NA categorical 256
bmhist_history_of_brain_metastases IMspire150 (Ascierto 2022) Atezolizumab + vemurafenib + cobimetinib 0.01953125 NA categorical 256
bmhist_no_history_of_brain_metastases IMspire150 (Ascierto 2022) Atezolizumab + vemurafenib + cobimetinib 0.98046875 NA categorical 256
immunotherapy IMspire150 (Ascierto 2022) Atezolizumab + vemurafenib + cobimetinib 0.16015625 NA categorical 256
tum_burd_less_44 IMspire150 (Ascierto 2022) Atezolizumab + vemurafenib + cobimetinib 0.42578125 NA categorical 256
tum_burd_great_44 IMspire150 (Ascierto 2022) Atezolizumab + vemurafenib + cobimetinib 0.57421875 NA categorical 256
tum_burd_miss IMspire150 (Ascierto 2022) Atezolizumab + vemurafenib + cobimetinib 0.00000000 NA categorical 256
ORR_investigator IMspire150 (Ascierto 2022) Atezolizumab + vemurafenib + cobimetinib 0.66406250 NA categorical 256
AE_any IMspire150 (Ascierto 2022) Atezolizumab + vemurafenib + cobimetinib 1.00000000 NA categorical 256
AE_g3 IMspire150 (Ascierto 2022) Atezolizumab + vemurafenib + cobimetinib 0.74891775 NA categorical 231
AE_disc IMspire150 (Ascierto 2022) Atezolizumab + vemurafenib + cobimetinib 0.38095238 NA categorical 231
ae_rash IMspire150 (Ascierto 2022) Atezolizumab + vemurafenib + cobimetinib 0.45454545 NA categorical 231
ae_diarrhoea IMspire150 (Ascierto 2022) Atezolizumab + vemurafenib + cobimetinib 0.50216450 NA categorical 231
ae_fatigue IMspire150 (Ascierto 2022) Atezolizumab + vemurafenib + cobimetinib 0.30303030 NA categorical 231
ae_arthralgia IMspire150 (Ascierto 2022) Atezolizumab + vemurafenib + cobimetinib 0.44588745 NA categorical 231
ae_pruritus IMspire150 (Ascierto 2022) Atezolizumab + vemurafenib + cobimetinib 0.28138528 NA categorical 231
Show R Code
subD <- "Summary level data"
trial <- "dab+tram"
# Step 1: Load the dataset
combiD_V <- read_csv(file = paste0(data_path, "/", subD, "/", trial, "/SLD_safety.csv"))

# Step 2: Create the GT table with bolded title and centered columns
combiD_V_table <- combiD_V %>%
  gt() %>%
  tab_header(
    title = html("<b>Preview of COMBI-d and COMBI-v Summary Level and Safety Data</b>")
  ) %>%
  cols_align(align = "center", columns = everything())

# Step 3: Render inside scrollable HTML div
htmltools::div(
  style = "overflow-x: auto; height: 400px;",
  combiD_V_table
)
Preview of COMBI-d and COMBI-v Summary Level and Safety Data
characteristics trial arm mean sd type n
sex_female COMBI-d+v (Robert et al. 2019) Dabrafenib + trametinib 0.433392540 NA categorical 563
sex_male COMBI-d+v (Robert et al. 2019) Dabrafenib + trametinib 0.566607460 NA categorical 563
race_white COMBI-d+v (Robert et al. 2019) Dabrafenib + trametinib 0.990000000 NA categorical 563
race_race_other COMBI-d+v (Robert et al. 2019) Dabrafenib + trametinib 0.000000000 NA categorical 563
geo_australia_new_zealand COMBI-d+v (Robert et al. 2019) Dabrafenib + trametinib NA NA categorical 563
geo_europe COMBI-d+v (Robert et al. 2019) Dabrafenib + trametinib NA NA categorical 563
geo_north_americas COMBI-d+v (Robert et al. 2019) Dabrafenib + trametinib NA NA categorical 563
geo_other COMBI-d+v (Robert et al. 2019) Dabrafenib + trametinib NA NA categorical 563
ethnicity_hispanic_latino COMBI-d+v (Robert et al. 2019) Dabrafenib + trametinib NA NA categorical 563
ethnicity_not_hispanic_latino COMBI-d+v (Robert et al. 2019) Dabrafenib + trametinib NA NA categorical 563
smoke_current_former_smoker COMBI-d+v (Robert et al. 2019) Dabrafenib + trametinib NA NA categorical 563
smoke_never_smoked COMBI-d+v (Robert et al. 2019) Dabrafenib + trametinib NA NA categorical 563
ecog_ecog_0 COMBI-d+v (Robert et al. 2019) Dabrafenib + trametinib 0.715808171 NA categorical 563
ecog_ecog_1 COMBI-d+v (Robert et al. 2019) Dabrafenib + trametinib 0.275310835 NA categorical 563
ecog_missing COMBI-d+v (Robert et al. 2019) Dabrafenib + trametinib 0.008880995 NA categorical 563
baseline_met_stage_m0 COMBI-d+v (Robert et al. 2019) Dabrafenib + trametinib 0.033747780 NA categorical 563
baseline_met_stage_m1a COMBI-d+v (Robert et al. 2019) Dabrafenib + trametinib 0.133214920 NA categorical 563
baseline_met_stage_m1b COMBI-d+v (Robert et al. 2019) Dabrafenib + trametinib 0.186500888 NA categorical 563
baseline_met_stage_m1c COMBI-d+v (Robert et al. 2019) Dabrafenib + trametinib 0.644760213 NA categorical 563
baseline_met_stage_m1d COMBI-d+v (Robert et al. 2019) Dabrafenib + trametinib 0.000000000 NA categorical 563
baseline_met_stage_m01a COMBI-d+v (Robert et al. 2019) Dabrafenib + trametinib 0.166962700 NA categorical 563
baseline_met_stage_m01a1b COMBI-d+v (Robert et al. 2019) Dabrafenib + trametinib 0.353463588 NA categorical 563
baseline_met_stage_m1c1d COMBI-d+v (Robert et al. 2019) Dabrafenib + trametinib 0.644760213 NA categorical 563
ldh_cat_ldh_great_uln COMBI-d+v (Robert et al. 2019) Dabrafenib + trametinib 0.344582593 NA categorical 563
ldh_cat_ldh_lower_equal_uln COMBI-d+v (Robert et al. 2019) Dabrafenib + trametinib 0.648312611 NA categorical 563
ldh_cat2_ldh_great_2xuln COMBI-d+v (Robert et al. 2019) Dabrafenib + trametinib 0.113676732 NA categorical 563
bmhist_history_of_brain_metastases COMBI-d+v (Robert et al. 2019) Dabrafenib + trametinib NA NA categorical 563
bmhist_no_history_of_brain_metastases COMBI-d+v (Robert et al. 2019) Dabrafenib + trametinib NA NA categorical 563
n_disease_sites_n_disease_sites_eql_above_3 COMBI-d+v (Robert et al. 2019) Dabrafenib + trametinib 0.488454707 NA categorical 563
n_disease_sites_n_disease_sites_below_3 COMBI-d+v (Robert et al. 2019) Dabrafenib + trametinib 0.509769094 NA categorical 563
prior_antineoplastic_not_received COMBI-d+v (Robert et al. 2019) Dabrafenib + trametinib 0.790000000 NA categorical 563
prior_antineoplastic_received COMBI-d+v (Robert et al. 2019) Dabrafenib + trametinib 0.210000000 NA categorical 563
prior_treatment_anti_ctla4 COMBI-d+v (Robert et al. 2019) Dabrafenib + trametinib NA NA categorical 563
prior_treatment_anti_pd1 COMBI-d+v (Robert et al. 2019) Dabrafenib + trametinib NA NA categorical 563
prior_treatment_combo_anti_ctla4_anti_pd1 COMBI-d+v (Robert et al. 2019) Dabrafenib + trametinib NA NA categorical 563
prior_treatment_combo_braf_mek_nras_inhibitor COMBI-d+v (Robert et al. 2019) Dabrafenib + trametinib NA NA categorical 563
prior_treatment_interferon COMBI-d+v (Robert et al. 2019) Dabrafenib + trametinib NA NA categorical 563
prior_treatment_investigational_antineoplastic_agents COMBI-d+v (Robert et al. 2019) Dabrafenib + trametinib NA NA categorical 563
immunotherapy COMBI-d+v (Robert et al. 2019) Dabrafenib + trametinib 0.207815275 NA categorical 563
ORR_investigator COMBI-d+v (Robert et al. 2019) Dabrafenib + trametinib 0.680284192 NA categorical 563
age_below_55 COMBI-d+v (Robert et al. 2019) Dabrafenib + trametinib 0.500000000 NA categorical 563
metsite_non_visceral COMBI-d+v (Robert et al. 2019) Dabrafenib + trametinib 0.210000000 NA categorical 563
metsite_visceral COMBI-d+v (Robert et al. 2019) Dabrafenib + trametinib 0.790000000 NA categorical 563
metsite_visceral_disease_status_not_report COMBI-d+v (Robert et al. 2019) Dabrafenib + trametinib 0.000000000 NA categorical 563
AE_any COMBI-d+v (Robert et al. 2019) Dabrafenib + trametinib 0.980322004 NA categorical 559
AE_g3 COMBI-d+v (Robert et al. 2019) Dabrafenib + trametinib 0.592128801 NA categorical 559
AE_disc COMBI-d+v (Robert et al. 2019) Dabrafenib + trametinib 0.177101968 NA categorical 559
ae_rash COMBI-d+v (Robert et al. 2019) Dabrafenib + trametinib 0.282647585 NA categorical 559
ae_diarrhoea COMBI-d+v (Robert et al. 2019) Dabrafenib + trametinib 0.355992844 NA categorical 559
ae_fatigue COMBI-d+v (Robert et al. 2019) Dabrafenib + trametinib 0.354203936 NA categorical 559
ae_arthralgia COMBI-d+v (Robert et al. 2019) Dabrafenib + trametinib 0.288014311 NA categorical 559
TRAE_any COMBI-d+v (Robert et al. 2019) Dabrafenib + trametinib 0.980322004 NA categorical 559
TRAE_g3 COMBI-d+v (Robert et al. 2019) Dabrafenib + trametinib 0.592128801 NA categorical 559
TRAE_disc COMBI-d+v (Robert et al. 2019) Dabrafenib + trametinib 0.177101968 NA categorical 559
TRAE_rash COMBI-d+v (Robert et al. 2019) Dabrafenib + trametinib 0.282647585 NA categorical 559
TRAE_diarrhoea COMBI-d+v (Robert et al. 2019) Dabrafenib + trametinib 0.355992844 NA categorical 559
TRAE_fatigue COMBI-d+v (Robert et al. 2019) Dabrafenib + trametinib 0.354203936 NA categorical 559
TRAE_arthralgia COMBI-d+v (Robert et al. 2019) Dabrafenib + trametinib 0.288014311 NA categorical 559
Show R Code
subD <- "Summary level data"
trial <- "enco+bini"
# Load the dataset
columbus <- read_csv(file = paste0(data_path, "/", subD, "/", trial, "/SLD.csv"))

# Create GT table with bold title and centered columns
columbus_table <- columbus %>%
  gt() %>%
  tab_header(
    title = html("<b>Preview of COLUMBUS Summary Level and Safety Data</b>")
  ) %>%
  cols_align(align = "center", columns = everything())

# Render with scroll
htmltools::div(
  style = "overflow-x: auto; height: 400px;",
  columbus_table
)
Preview of COLUMBUS Summary Level and Safety Data
characteristics trial arm mean sd type n
sex_female COLUMBUS (Dummer et al. 2022) Encorafenib + binimetinib 0.4010417 NA categorical 192
sex_male COLUMBUS (Dummer et al. 2022) Encorafenib + binimetinib 0.5989583 NA categorical 192
race_white COLUMBUS (Dummer et al. 2022) Encorafenib + binimetinib NA NA categorical 192
race_race_other COLUMBUS (Dummer et al. 2022) Encorafenib + binimetinib NA NA categorical 192
geo_australia_new_zealand COLUMBUS (Dummer et al. 2022) Encorafenib + binimetinib NA NA categorical 192
geo_europe COLUMBUS (Dummer et al. 2022) Encorafenib + binimetinib NA NA categorical 192
geo_north_americas COLUMBUS (Dummer et al. 2022) Encorafenib + binimetinib NA NA categorical 192
geo_other COLUMBUS (Dummer et al. 2022) Encorafenib + binimetinib NA NA categorical 192
ethnicity_hispanic_latino COLUMBUS (Dummer et al. 2022) Encorafenib + binimetinib NA NA categorical 192
ethnicity_not_hispanic_latino COLUMBUS (Dummer et al. 2022) Encorafenib + binimetinib NA NA categorical 192
smoke_current_former_smoker COLUMBUS (Dummer et al. 2022) Encorafenib + binimetinib NA NA categorical 192
smoke_never_smoked COLUMBUS (Dummer et al. 2022) Encorafenib + binimetinib NA NA categorical 192
ecog_ecog_0 COLUMBUS (Dummer et al. 2022) Encorafenib + binimetinib 0.7083333 NA categorical 192
ecog_ecog_1 COLUMBUS (Dummer et al. 2022) Encorafenib + binimetinib 0.2916667 NA categorical 192
ecog_missing COLUMBUS (Dummer et al. 2022) Encorafenib + binimetinib 0.0000000 NA categorical 192
baseline_met_stage_m0 COLUMBUS (Dummer et al. 2022) Encorafenib + binimetinib 0.0468750 NA categorical 192
baseline_met_stage_m1a COLUMBUS (Dummer et al. 2022) Encorafenib + binimetinib 0.1354167 NA categorical 192
baseline_met_stage_m1b COLUMBUS (Dummer et al. 2022) Encorafenib + binimetinib 0.1770833 NA categorical 192
baseline_met_stage_m1c COLUMBUS (Dummer et al. 2022) Encorafenib + binimetinib 0.6406250 NA categorical 192
baseline_met_stage_m1d COLUMBUS (Dummer et al. 2022) Encorafenib + binimetinib 0.0000000 NA categorical 192
baseline_met_stage_m01a COLUMBUS (Dummer et al. 2022) Encorafenib + binimetinib 0.1822917 NA categorical 192
baseline_met_stage_m01a1b COLUMBUS (Dummer et al. 2022) Encorafenib + binimetinib 0.3593750 NA categorical 192
baseline_met_stage_m1c1d COLUMBUS (Dummer et al. 2022) Encorafenib + binimetinib 0.6406250 NA categorical 192
ldh_cat_ldh_great_uln COLUMBUS (Dummer et al. 2022) Encorafenib + binimetinib 0.2864583 NA categorical 192
ldh_cat_ldh_lower_equal_uln COLUMBUS (Dummer et al. 2022) Encorafenib + binimetinib 0.7135417 NA categorical 192
ldh_cat2_ldh_great_2xuln COLUMBUS (Dummer et al. 2022) Encorafenib + binimetinib NA NA categorical 192
bmhist_history_of_brain_metastases COLUMBUS (Dummer et al. 2022) Encorafenib + binimetinib NA NA categorical 192
bmhist_no_history_of_brain_metastases COLUMBUS (Dummer et al. 2022) Encorafenib + binimetinib NA NA categorical 192
n_disease_sites_n_disease_sites_eql_above_3 COLUMBUS (Dummer et al. 2022) Encorafenib + binimetinib NA NA categorical 192
n_disease_sites_n_disease_sites_below_3 COLUMBUS (Dummer et al. 2022) Encorafenib + binimetinib NA NA categorical 192
n_organs_0 COLUMBUS (Dummer et al. 2022) Encorafenib + binimetinib 0.0000000 NA categorical 192
n_organs_1 COLUMBUS (Dummer et al. 2022) Encorafenib + binimetinib 0.2447917 NA categorical 192
n_organs_2 COLUMBUS (Dummer et al. 2022) Encorafenib + binimetinib 0.3020833 NA categorical 192
n_organs_equal_above_3 COLUMBUS (Dummer et al. 2022) Encorafenib + binimetinib 0.4531250 NA categorical 192
prior_antineoplastic_not_received COLUMBUS (Dummer et al. 2022) Encorafenib + binimetinib NA NA categorical 192
prior_antineoplastic_received COLUMBUS (Dummer et al. 2022) Encorafenib + binimetinib NA NA categorical 192
prior_treatment_anti_ctla4 COLUMBUS (Dummer et al. 2022) Encorafenib + binimetinib NA NA categorical 192
prior_treatment_anti_pd1 COLUMBUS (Dummer et al. 2022) Encorafenib + binimetinib NA NA categorical 192
prior_treatment_combo_anti_ctla4_anti_pd1 COLUMBUS (Dummer et al. 2022) Encorafenib + binimetinib NA NA categorical 192
prior_treatment_combo_braf_mek_nras_inhibitor COLUMBUS (Dummer et al. 2022) Encorafenib + binimetinib NA NA categorical 192
prior_treatment_interferon COLUMBUS (Dummer et al. 2022) Encorafenib + binimetinib 0.2656250 NA categorical 192
prior_treatment_investigational_antineoplastic_agents COLUMBUS (Dummer et al. 2022) Encorafenib + binimetinib NA NA categorical 192
immunotherapy COLUMBUS (Dummer et al. 2022) Encorafenib + binimetinib 0.2968750 NA categorical 192
ORR_bicr COLUMBUS (Dummer et al. 2022) Encorafenib + binimetinib 0.6354167 NA categorical 192
ORR_investigator COLUMBUS (Dummer et al. 2022) Encorafenib + binimetinib 0.7552083 NA categorical 192
age_below_57 COLUMBUS (Dummer et al. 2022) Encorafenib + binimetinib 0.5000000 NA categorical 192
metsite_non_visceral COLUMBUS (Dummer et al. 2022) Encorafenib + binimetinib NA NA categorical 192
metsite_visceral COLUMBUS (Dummer et al. 2022) Encorafenib + binimetinib NA NA categorical 192
metsite_visceral_disease_status_not_report COLUMBUS (Dummer et al. 2022) Encorafenib + binimetinib NA NA categorical 192
AE_any COLUMBUS (Dummer et al. 2022) Encorafenib + binimetinib 0.9843750 NA categorical 192
AE_g3 COLUMBUS (Dummer et al. 2022) Encorafenib + binimetinib 0.6822917 NA categorical 192
AE_disc COLUMBUS (Dummer et al. 2022) Encorafenib + binimetinib 0.1041667 NA categorical 192
ae_rash COLUMBUS (Dummer et al. 2022) Encorafenib + binimetinib 0.1614583 NA categorical 192
ae_diarrhoea COLUMBUS (Dummer et al. 2022) Encorafenib + binimetinib 0.3854167 NA categorical 192
ae_fatigue COLUMBUS (Dummer et al. 2022) Encorafenib + binimetinib 0.2968750 NA categorical 192
ae_arthralgia COLUMBUS (Dummer et al. 2022) Encorafenib + binimetinib 0.2864583 NA categorical 192
ae_pruritus COLUMBUS (Dummer et al. 2022) Encorafenib + binimetinib 0.1250000 NA categorical 192
TRAE_any COLUMBUS (Dummer et al. 2022) Encorafenib + binimetinib 0.9843750 NA categorical 192
TRAE_g3 COLUMBUS (Dummer et al. 2022) Encorafenib + binimetinib 0.6822917 NA categorical 192
TRAE_disc COLUMBUS (Dummer et al. 2022) Encorafenib + binimetinib 0.1041667 NA categorical 192
TRAE_rash COLUMBUS (Dummer et al. 2022) Encorafenib + binimetinib 0.1614583 NA categorical 192
TRAE_diarrhoea COLUMBUS (Dummer et al. 2022) Encorafenib + binimetinib 0.3854167 NA categorical 192
TRAE_fatigue COLUMBUS (Dummer et al. 2022) Encorafenib + binimetinib 0.2968750 NA categorical 192
TRAE_arthralgia COLUMBUS (Dummer et al. 2022) Encorafenib + binimetinib 0.2864583 NA categorical 192
TRAE_pruritus COLUMBUS (Dummer et al. 2022) Encorafenib + binimetinib 0.1250000 NA categorical 192
Show R Code
subD <- "Summary level data"
trial <- "vem+cobi"
# Load the dataset
cobrim <- read_csv(file = paste0(data_path, "/", subD, "/", trial, "/SLD.csv"))

# Create GT table with bold title and centered columns
cobrim_table <- cobrim %>%
  gt() %>%
  tab_header(
    title = html("<b>Preview of coBRIM Summary Level and Safety Data</b>")
  ) %>%
  cols_align(align = "center", columns = everything())

# Render with scroll
htmltools::div(
  style = "overflow-x: auto; height: 400px;",
  cobrim_table
)
Preview of coBRIM Summary Level and Safety Data
characteristics trial arm mean sd type n
sex_female coBRIM Cobimetinib + vemurafenib (Ascierto 2021) 0.408906883 NA categorical 247
sex_male coBRIM Cobimetinib + vemurafenib (Ascierto 2021) 0.591093117 NA categorical 247
race_white coBRIM Cobimetinib + vemurafenib (Ascierto 2021) 0.919028340 NA categorical 247
race_race_other coBRIM Cobimetinib + vemurafenib (Ascierto 2021) 0.080971660 NA categorical 247
geo_australia_new_zealand coBRIM Cobimetinib + vemurafenib (Ascierto 2021) NA NA categorical 247
geo_europe coBRIM Cobimetinib + vemurafenib (Ascierto 2021) 0.736842105 NA categorical 247
geo_north_americas coBRIM Cobimetinib + vemurafenib (Ascierto 2021) 0.101214575 NA categorical 247
geo_other coBRIM Cobimetinib + vemurafenib (Ascierto 2021) 0.161943320 NA categorical 247
ethnicity_hispanic_latino coBRIM Cobimetinib + vemurafenib (Ascierto 2021) NA NA categorical 247
ethnicity_not_hispanic_latino coBRIM Cobimetinib + vemurafenib (Ascierto 2021) NA NA categorical 247
smoke_current_former_smoker coBRIM Cobimetinib + vemurafenib (Ascierto 2021) NA NA categorical 247
smoke_never_smoked coBRIM Cobimetinib + vemurafenib (Ascierto 2021) NA NA categorical 247
ecog_ecog_0 coBRIM Cobimetinib + vemurafenib (Ascierto 2021) 0.744939271 NA categorical 247
ecog_ecog_1 coBRIM Cobimetinib + vemurafenib (Ascierto 2021) 0.238866397 NA categorical 247
ecog_missing coBRIM Cobimetinib + vemurafenib (Ascierto 2021) 0.016194332 NA categorical 247
baseline_met_stage_m0 coBRIM Cobimetinib + vemurafenib (Ascierto 2021) 0.085020243 NA categorical 247
baseline_met_stage_m1a coBRIM Cobimetinib + vemurafenib (Ascierto 2021) 0.161943320 NA categorical 247
baseline_met_stage_m1b coBRIM Cobimetinib + vemurafenib (Ascierto 2021) 0.161943320 NA categorical 247
baseline_met_stage_m1c coBRIM Cobimetinib + vemurafenib (Ascierto 2021) 0.591093117 NA categorical 247
baseline_met_stage_m1d coBRIM Cobimetinib + vemurafenib (Ascierto 2021) 0.000000000 NA categorical 247
baseline_met_stage_m01a coBRIM Cobimetinib + vemurafenib (Ascierto 2021) 0.246963563 NA categorical 247
baseline_met_stage_m01a1b coBRIM Cobimetinib + vemurafenib (Ascierto 2021) 0.408906883 NA categorical 247
baseline_met_stage_m1c1d coBRIM Cobimetinib + vemurafenib (Ascierto 2021) 0.591093117 NA categorical 247
ldh_cat_ldh_great_uln coBRIM Cobimetinib + vemurafenib (Ascierto 2021) 0.453441296 NA categorical 247
ldh_cat_ldh_lower_equal_uln coBRIM Cobimetinib + vemurafenib (Ascierto 2021) 0.527103452 NA categorical 247
ldh_cat_ldh_missing coBRIM Cobimetinib + vemurafenib (Ascierto 2021) 0.019455253 NA categorical 247
ldh_cat2_ldh_great_2xuln coBRIM Cobimetinib + vemurafenib (Ascierto 2021) NA NA categorical 247
bmhist_history_of_brain_metastases coBRIM Cobimetinib + vemurafenib (Ascierto 2021) 0.004048583 NA categorical 247
bmhist_no_history_of_brain_metastases coBRIM Cobimetinib + vemurafenib (Ascierto 2021) 0.995951417 NA categorical 247
n_disease_sites_n_disease_sites_eql_above_3 coBRIM Cobimetinib + vemurafenib (Ascierto 2021) NA NA categorical 247
n_disease_sites_n_disease_sites_below_3 coBRIM Cobimetinib + vemurafenib (Ascierto 2021) NA NA categorical 247
prior_antineoplastic_not_received coBRIM Cobimetinib + vemurafenib (Ascierto 2021) NA NA categorical 247
prior_antineoplastic_received coBRIM Cobimetinib + vemurafenib (Ascierto 2021) NA NA categorical 247
prior_treatment_anti_ctla4 coBRIM Cobimetinib + vemurafenib (Ascierto 2021) NA NA categorical 247
prior_treatment_anti_pd1 coBRIM Cobimetinib + vemurafenib (Ascierto 2021) NA NA categorical 247
prior_treatment_combo_anti_ctla4_anti_pd1 coBRIM Cobimetinib + vemurafenib (Ascierto 2021) NA NA categorical 247
prior_treatment_combo_braf_mek_nras_inhibitor coBRIM Cobimetinib + vemurafenib (Ascierto 2021) NA NA categorical 247
prior_treatment_interferon coBRIM Cobimetinib + vemurafenib (Ascierto 2021) NA NA categorical 247
prior_treatment_investigational_antineoplastic_agents coBRIM Cobimetinib + vemurafenib (Ascierto 2021) NA NA categorical 247
immunotherapy coBRIM Cobimetinib + vemurafenib (Ascierto 2021) NA NA categorical 247
ORR_investigator coBRIM Cobimetinib + vemurafenib (Ascierto 2021) 0.696356275 NA categorical 247
age_below_56 coBRIM Cobimetinib + vemurafenib (Ascierto 2021) 0.500000000 NA categorical 247
metsite_non_visceral coBRIM Cobimetinib + vemurafenib (Ascierto 2021) NA NA categorical 247
metsite_visceral coBRIM Cobimetinib + vemurafenib (Ascierto 2021) NA NA categorical 247
metsite_visceral_disease_status_not_report coBRIM Cobimetinib + vemurafenib (Ascierto 2021) NA NA categorical 247
AE_any coBRIM Cobimetinib + vemurafenib (Ascierto 2021) 0.991935484 NA categorical 248
AE_g3 coBRIM Cobimetinib + vemurafenib (Ascierto 2021) 0.758064516 NA categorical 248
AE_disc coBRIM Cobimetinib + vemurafenib (Ascierto 2021) 0.270161290 NA categorical 248
ae_rash coBRIM Cobimetinib + vemurafenib (Ascierto 2021) 0.419354839 NA categorical 248
ae_diarrhoea coBRIM Cobimetinib + vemurafenib (Ascierto 2021) 0.612903226 NA categorical 248
ae_fatigue coBRIM Cobimetinib + vemurafenib (Ascierto 2021) 0.375000000 NA categorical 248
ae_arthralgia coBRIM Cobimetinib + vemurafenib (Ascierto 2021) 0.387096774 NA categorical 248
ae_pruritus coBRIM Cobimetinib + vemurafenib (Ascierto 2021) 0.205645161 NA categorical 248
TRAE_any coBRIM Cobimetinib + vemurafenib (Ascierto 2021) 0.991935484 NA categorical 248
TRAE_g3 coBRIM Cobimetinib + vemurafenib (Ascierto 2021) 0.758064516 NA categorical 248
TRAE_disc coBRIM Cobimetinib + vemurafenib (Ascierto 2021) 0.270161290 NA categorical 248
TRAE_rash coBRIM Cobimetinib + vemurafenib (Ascierto 2021) 0.419354839 NA categorical 248
TRAE_diarrhoea coBRIM Cobimetinib + vemurafenib (Ascierto 2021) 0.612903226 NA categorical 248
TRAE_fatigue coBRIM Cobimetinib + vemurafenib (Ascierto 2021) 0.375000000 NA categorical 248
TRAE_arthralgia coBRIM Cobimetinib + vemurafenib (Ascierto 2021) 0.387096774 NA categorical 248
TRAE_pruritus coBRIM Cobimetinib + vemurafenib (Ascierto 2021) 0.205645161 NA categorical 248

3.3 Simulate NIVO+RELA Patient-Level Dataset

The following simulation creates pseudo-IPD for the NIVO+RELA cohort based on published summary statistics from RELATIVITY-047. The dataset includes:

  • Baseline characteristics (used in matching)
  • Binary outcome: Investigator-assessed objective response rate (ORR)

We consider all baseline characteristics that were used for matching in the MAIC in our main publication.

First, we set a seed value for reproducibility. Then, we set the cohort sample size and summary statistics (in this case, proportions) for each of the baseline characteristics and for the binary outcome. The summary statistics are those from the actual RELATIVITY-047 trial. Finally, the baseline characteristics and ORR are simulated from binomial distributions based on summary statistics and randomly assigned to patients.

This creates the patient-level dataset including baseline characteristics and one binary outcome.

👇 Expand the code chunk below to view the code used to create pseudo IPD for Relativity 047

Show R Code
# Set seed for reproducibility
set.seed(42)

# Summary-level parameters
## Cohort sample size ##
N_patients_NR <- 136 # NR represents Nivolumab-Relatlimab treated patients from Relativity 47 with BRAF mutations

## Baseline characteristics ##
age_above_55_NR <- 0.404
female_pc_NR <- 0.485 # pc represents Proportion of the Cohort
ecog_0_pc_NR <- 0.706 # 70.6% of patients had ECOG 0
ldh_great_uln_pc_NR <- 0.338 #33.8% of patients had LD> ULN
disease_sites_above3_pc_NR <- 0.419 # 41.9% of patients had 3 or more sites of disease
mets_M0_M1a_pc_NR <- 0.309 # 30.9% of patients had M0 or M1a disease
mets_M1b_pc_NR <- 0.25 # 25% of patients had M1b disease
immunotherapy_NR <- 0.118 # 11.8% of patients had prior immunotherapy

## Outcomes ##
### Binary outcomes (ORR) ##
ORR_investigator_NR <- 0.480184 # this value was estimated, not taken from data

## Create patient-level dataset including baseline characteristics and ORR ##
NR.IPD <- data.frame(
               # Sample size 
               n = N_patients_NR, 
               # Patient IDs
               ID = seq(1, N_patients_NR, by = 1), 
               # Baseline characteristics
               age_above_55 = rbinom(N_patients_NR, 1, age_above_55_NR), 
               gender_female = rbinom(N_patients_NR, 1, female_pc_NR), 
               ecog_0 = rbinom(N_patients_NR, 1, ecog_0_pc_NR), 
               ldh_great_uln = rbinom(N_patients_NR, 1, ldh_great_uln_pc_NR), 
               disease_sites_above3 = rbinom(N_patients_NR, 1, disease_sites_above3_pc_NR), 
               mets_M0_M1a = rbinom(N_patients_NR, 1, mets_M0_M1a_pc_NR), 
               immunotherapy = rbinom(N_patients_NR, 1, immunotherapy_NR),
               # Binary outcomes
               ORR_investigator = rbinom(N_patients_NR, 1, ORR_investigator_NR),
               # Treatment
               trt = "NIVO+RELA")

A tabular form of the synthetic dataset is seen below.

Show R Code
# Define a named vector mapping variable names to readable labels
var_labels <- c(
  age_above_55 = "Age > 55",
  gender_female = "Female",
  ecog_0 = "ECOG Performance Status 0",
  ldh_great_uln = "LDH > Upper Limit of Normal",
  disease_sites_above3 = "≥3 Disease Sites",
  mets_M0_M1a = "M0/M1a Metastases",
  immunotherapy = "Prior Immunotherapy",
  ORR_investigator = "Objective Response Rate (Investigator-Assessed)"
)

# Rename columns using `rename_with()`
NR.IPD_readable <- NR.IPD %>%
  rename_with(~ sapply(.x, function(col) ifelse(col %in% names(var_labels), var_labels[[col]], col)), everything())

# Step 1: Create the GT table
ipd_table <- NR.IPD_readable %>%
  gt() %>%
  tab_header(title = html("<b>Simulated Patient-Level Data for NIVO+RELA</b>")) %>%
  cols_align(align = "center", columns = everything())

# Step 2: Render the table inside a scrollable HTML div
htmltools::div(
  style = "overflow-x: auto; height: 400px;",
  ipd_table
)
Simulated Patient-Level Data for NIVO+RELA
n ID Age > 55 Female ECOG Performance Status 0 LDH > Upper Limit of Normal ≥3 Disease Sites M0/M1a Metastases Prior Immunotherapy Objective Response Rate (Investigator-Assessed) trt
136 1 1 1 1 0 0 0 1 0 NIVO+RELA
136 2 1 0 1 0 1 1 0 1 NIVO+RELA
136 3 0 1 0 0 1 1 0 0 NIVO+RELA
136 4 1 1 1 0 0 1 0 1 NIVO+RELA
136 5 1 0 1 0 0 0 0 1 NIVO+RELA
136 6 0 0 1 1 0 1 0 1 NIVO+RELA
136 7 1 0 1 0 1 0 0 1 NIVO+RELA
136 8 0 1 1 0 0 0 0 1 NIVO+RELA
136 9 1 1 1 0 1 1 0 0 NIVO+RELA
136 10 1 1 1 1 0 0 1 1 NIVO+RELA
136 11 0 0 1 0 0 0 0 0 NIVO+RELA
136 12 1 1 0 0 0 0 0 1 NIVO+RELA
136 13 1 0 1 0 1 1 0 0 NIVO+RELA
136 14 0 0 1 0 0 1 0 1 NIVO+RELA
136 15 0 1 1 1 1 0 0 0 NIVO+RELA
136 16 1 0 1 1 1 1 0 0 NIVO+RELA
136 17 1 1 1 0 1 0 0 1 NIVO+RELA
136 18 0 0 1 0 0 1 0 0 NIVO+RELA
136 19 0 1 0 0 0 1 0 0 NIVO+RELA
136 20 0 1 0 0 0 0 0 0 NIVO+RELA
136 21 1 0 1 1 0 1 0 1 NIVO+RELA
136 22 0 0 1 0 0 0 0 1 NIVO+RELA
136 23 1 0 1 0 1 0 1 0 NIVO+RELA
136 24 1 1 1 0 0 1 0 0 NIVO+RELA
136 25 0 1 0 0 0 1 0 0 NIVO+RELA
136 26 0 1 1 0 1 1 0 0 NIVO+RELA
136 27 0 1 1 1 1 0 0 1 NIVO+RELA
136 28 1 0 1 0 0 0 0 0 NIVO+RELA
136 29 0 1 1 0 1 0 0 0 NIVO+RELA
136 30 1 0 1 0 0 0 0 0 NIVO+RELA
136 31 1 0 1 0 0 1 0 0 NIVO+RELA
136 32 1 0 1 0 1 1 0 1 NIVO+RELA
136 33 0 0 0 0 0 0 0 0 NIVO+RELA
136 34 1 0 0 1 0 1 0 0 NIVO+RELA
136 35 0 1 1 0 1 1 0 1 NIVO+RELA
136 36 1 0 1 1 0 1 0 0 NIVO+RELA
136 37 0 0 1 0 0 0 0 0 NIVO+RELA
136 38 0 0 1 0 1 0 0 1 NIVO+RELA
136 39 1 0 1 0 1 0 0 1 NIVO+RELA
136 40 1 0 0 1 1 0 0 0 NIVO+RELA
136 41 0 1 1 1 0 0 0 1 NIVO+RELA
136 42 0 1 1 0 0 0 1 1 NIVO+RELA
136 43 0 1 1 0 1 0 0 0 NIVO+RELA
136 44 1 1 0 0 1 0 1 0 NIVO+RELA
136 45 0 1 1 1 0 0 1 1 NIVO+RELA
136 46 1 1 1 0 1 1 0 0 NIVO+RELA
136 47 1 0 0 0 0 0 0 0 NIVO+RELA
136 48 1 0 0 0 1 0 0 0 NIVO+RELA
136 49 1 1 1 0 0 0 0 1 NIVO+RELA
136 50 1 1 1 1 1 0 0 0 NIVO+RELA
136 51 0 1 1 1 1 0 0 1 NIVO+RELA
136 52 0 1 1 1 0 1 0 1 NIVO+RELA
136 53 0 0 1 0 0 1 0 0 NIVO+RELA
136 54 1 0 0 0 0 1 0 0 NIVO+RELA
136 55 0 0 1 1 0 0 0 1 NIVO+RELA
136 56 1 1 0 0 0 0 0 1 NIVO+RELA
136 57 1 0 0 0 0 0 0 0 NIVO+RELA
136 58 0 0 0 0 0 0 0 1 NIVO+RELA
136 59 0 0 0 0 1 0 0 0 NIVO+RELA
136 60 0 0 1 0 0 1 0 1 NIVO+RELA
136 61 1 1 1 0 0 1 0 0 NIVO+RELA
136 62 1 0 1 1 0 0 0 0 NIVO+RELA
136 63 1 1 1 0 1 0 1 0 NIVO+RELA
136 64 0 1 1 0 0 1 0 1 NIVO+RELA
136 65 1 1 1 1 0 1 1 0 NIVO+RELA
136 66 0 1 1 1 0 0 0 0 NIVO+RELA
136 67 0 1 0 0 0 0 0 1 NIVO+RELA
136 68 1 0 1 0 0 1 0 0 NIVO+RELA
136 69 1 0 1 0 1 1 1 1 NIVO+RELA
136 70 0 0 1 1 0 0 0 0 NIVO+RELA
136 71 0 1 0 0 1 0 0 0 NIVO+RELA
136 72 0 0 1 1 1 0 0 0 NIVO+RELA
136 73 0 0 1 0 0 0 0 0 NIVO+RELA
136 74 0 0 1 0 0 0 0 0 NIVO+RELA
136 75 0 1 0 1 0 0 0 0 NIVO+RELA
136 76 1 1 1 0 0 0 0 0 NIVO+RELA
136 77 0 0 0 1 0 0 0 1 NIVO+RELA
136 78 0 0 0 0 1 0 0 1 NIVO+RELA
136 79 0 0 1 1 0 1 0 1 NIVO+RELA
136 80 0 1 1 0 1 1 1 0 NIVO+RELA
136 81 0 1 1 0 0 1 0 0 NIVO+RELA
136 82 0 0 1 0 1 1 0 0 NIVO+RELA
136 83 0 1 0 1 0 0 0 0 NIVO+RELA
136 84 1 1 1 0 0 0 0 1 NIVO+RELA
136 85 1 0 1 1 1 0 0 1 NIVO+RELA
136 86 0 0 0 0 0 0 0 1 NIVO+RELA
136 87 0 1 1 0 0 1 0 0 NIVO+RELA
136 88 0 1 1 0 0 0 0 0 NIVO+RELA
136 89 0 0 1 1 0 0 0 1 NIVO+RELA
136 90 0 1 1 0 0 0 1 0 NIVO+RELA
136 91 1 0 1 0 0 0 0 0 NIVO+RELA
136 92 0 0 0 0 1 0 0 1 NIVO+RELA
136 93 0 0 1 0 0 0 0 1 NIVO+RELA
136 94 1 0 1 0 0 0 0 0 NIVO+RELA
136 95 1 0 1 0 1 1 0 1 NIVO+RELA
136 96 1 0 0 1 1 0 0 1 NIVO+RELA
136 97 0 0 1 0 0 1 1 1 NIVO+RELA
136 98 0 0 1 1 0 1 1 0 NIVO+RELA
136 99 1 1 1 0 0 1 0 1 NIVO+RELA
136 100 1 1 1 0 1 1 0 0 NIVO+RELA
136 101 1 0 0 1 0 0 1 0 NIVO+RELA
136 102 0 1 1 0 1 0 0 1 NIVO+RELA
136 103 0 1 0 0 0 0 0 0 NIVO+RELA
136 104 0 0 0 1 0 0 0 1 NIVO+RELA
136 105 1 0 1 0 1 0 0 1 NIVO+RELA
136 106 1 1 1 0 1 0 1 1 NIVO+RELA
136 107 1 0 1 0 0 0 1 0 NIVO+RELA
136 108 1 0 0 0 1 0 0 0 NIVO+RELA
136 109 0 1 1 0 1 0 0 0 NIVO+RELA
136 110 0 0 1 0 1 0 0 1 NIVO+RELA
136 111 1 0 1 0 0 1 0 0 NIVO+RELA
136 112 1 0 1 0 0 0 0 0 NIVO+RELA
136 113 1 0 1 1 0 0 0 0 NIVO+RELA
136 114 0 1 0 1 0 0 0 1 NIVO+RELA
136 115 0 0 1 1 0 0 0 1 NIVO+RELA
136 116 0 0 0 0 0 0 1 0 NIVO+RELA
136 117 0 0 1 0 1 0 1 0 NIVO+RELA
136 118 0 1 1 1 1 0 1 1 NIVO+RELA
136 119 1 0 0 0 0 1 0 1 NIVO+RELA
136 120 1 1 1 1 1 0 0 0 NIVO+RELA
136 121 0 0 1 0 0 1 0 0 NIVO+RELA
136 122 0 1 1 0 1 1 0 1 NIVO+RELA
136 123 0 1 0 1 0 0 0 0 NIVO+RELA
136 124 0 1 1 0 0 0 0 1 NIVO+RELA
136 125 1 1 0 1 1 0 0 1 NIVO+RELA
136 126 0 0 0 0 0 0 0 0 NIVO+RELA
136 127 1 0 1 0 0 0 0 0 NIVO+RELA
136 128 1 1 1 0 1 0 0 0 NIVO+RELA
136 129 0 0 1 0 0 0 0 0 NIVO+RELA
136 130 1 0 1 0 0 0 0 1 NIVO+RELA
136 131 1 1 1 0 0 0 0 0 NIVO+RELA
136 132 1 0 1 0 0 1 0 1 NIVO+RELA
136 133 1 1 0 1 0 0 0 1 NIVO+RELA
136 134 1 1 1 1 0 0 0 0 NIVO+RELA
136 135 1 0 0 0 0 1 0 1 NIVO+RELA
136 136 0 1 1 0 0 0 0 1 NIVO+RELA

As shown in the summary table below, the synthetic data aligns with the IPD from Table 1 in the manuscript but exhibits some differences due to the random generation process used in the code above. As a result, it serves as pseudo-IPD for this worked example.

Show R Code
NR.IPD_readable %>%
  select(-ID, -trt) %>%  # Exclude ID and treatment column
  tbl_summary(
    statistic = all_categorical() ~ "{n} ({p}%)",  # Ensures n (%) format
    missing = "no"
  ) %>%
 # add_n() %>%  # Adds "N = 136" at the top but removes redundant row below
  modify_header(label = "**Characteristic**") %>%
  modify_spanning_header(all_stat_cols() ~ "**NIVO+RELA**") %>%
  modify_footnote(update = everything() ~ "n (%)") %>%  # Ensure proper footnote
  bold_labels() %>%
  modify_table_body(~ .x %>% filter(variable != "n"))  # Removes the redundant first two rows
Characteristic1
NIVO+RELA
N = 1361
Age > 55 65 (48%)
Female 62 (46%)
ECOG Performance Status 0 99 (73%)
LDH > Upper Limit of Normal 39 (29%)
≥3 Disease Sites 48 (35%)
M0/M1a Metastases 45 (33%)
Prior Immunotherapy 19 (14%)
Objective Response Rate (Investigator-Assessed) 60 (44%)
1 n (%)

4 Reconstruction of IPD from Published Kaplan-Meier Curves

4.1 Overview

This section presents a reproducible R-based implementation of the method developed by Guyot et al. (2012) for reconstructing individual patient-level data from published Kaplan-Meier survival curves.

This reconstruction is essential for indirect comparisons—including MAIC or network meta-analyses—when patient-level data are not publicly available.

The approach requires key inputs:

  1. Published Kaplan-Meier Curve Image
    • An image or PDF of the KM curve from the original publication.
  2. Digitized Kaplan-Meier Curve Data
    • Extracted (x, y) coordinates from the survival curve using a digitization tool, such as WebPlotDigitizer.
    • Format: CSV with two columns:
      • x: Time points along the x-axis (e.g., months)
      • Curve1: Corresponding survival probabilities.
  3. Number-at-Risk Table
    • Extracted counts of patients at risk at key time points.
    • Format: CSV with the following columns:
      • Time: Time points where the number at risk is reported
      • Nrisk: Number of patients at risk
      • Median Survival, CI, Number of Events, and Unit (as reported)

We then apply the reconstruction algorithm to generate individual-level event data suitable for survival analysis.

Reference:
Guyot P, Ades AE, Ouwens MJ, Welton NJ. Enhanced secondary analysis of survival data: reconstructing the data from published Kaplan-Meier survival curves. BMC Medical Research Methodology. 2012;12:9. doi:10.1186/1471-2288-12-9

4.2 Visual Reference: Published Kaplan-Meier Curve (Figure S1)

For this example, we use Figure S1 from the supplementary material of Dummer et al., 2018 (Lancet Oncology). This figure shows Progression-Free Survival (PFS) curves comparing COMBO450 to VEM or ENCO300 arms.

This figure is the source for our digitization process. We use WebPlotDigitizer to extract the data points, followed by reconstruction using the Guyot method.

Reference:
Dummer R, et al. Encorafenib plus binimetinib versus vemurafenib or encorafenib in patients with BRAF-mutant melanoma (COLUMBUS): a multicentre, open-label, randomised phase 3 trial. Lancet Oncology. 2018;19(5):603-615. doi:10.1016/S1470-2045(18)30142-6

4.3 Load Digitized Kaplan-Meier Curve and Number-at-Risk Data

The reconstruction algorithm requires two input files in .csv format:

  1. Digitized Kaplan-Meier Curve Data — containing time (x) and survival probability (y) points extracted from the published curve (see above).

  2. Number-at-Risk Table — containing the number of patients at risk at specific time points, extracted directly from the publication.

Both files are read using data.table::fread() for efficiency.

Show R Code
# Define file paths using here::here() for portability
# Read digitized KM curve data and number at risk table
data_path <- here::here("files/ENCO+BINI PFS Digitization Example") # Change to accomodate the path of your files
km_file <- file.path(data_path, "PFS_Dummer2018.csv")
file.exists(km_file)
[1] TRUE
Show R Code
nrisk_file <- file.path(data_path, "PFS_Dummer2018_NumAtRisk.csv")
digC <- fread(km_file)
# Load digitized Kaplan-Meier curve data
digC <- fread(km_file) %>%
  mutate(Curve1 = y) %>%  # Rename 'y' to 'Curve1' for clarity
  dplyr::select(x, Curve1)

4.3.1 Inspect Loaded Data

We first inspect the digitized Kaplan-Meier curve data (digC) and the number-at-risk table (nRisk) to confirm successful data loading and correct formatting.

4.3.2 Digitized Kaplan-Meier Curve Data (digC)

Show R Code
# Create the GT Table with column header styling
digC_table <- 
  digC %>%
  gt() %>%
  tab_header(
    title = html("<b>Preview of Digitized Kaplan-Meier Curve Data</b>")
  ) %>%
  cols_align(align = "center", columns = everything()) %>%
  tab_style(
    style = cell_text(
      weight = "bold",
      style = "italic",
      color = "#2C3E50",     # Dark slate blue-gray tone
      font = "Arial"
    ),
    locations = cells_column_labels(columns = everything())
  )

# Render inside scrollable div
htmltools::div(
  style = "overflow-x: auto; height: 400px;",
  digC_table
)
Preview of Digitized Kaplan-Meier Curve Data
x Curve1
0.0000 1.00000
0.7728 1.00000
0.8029 0.99492
1.2575 0.99492
1.3027 0.98858
1.7420 0.98731
1.8327 0.97843
2.5297 0.97800
2.5446 0.97335
2.6204 0.97335
2.6354 0.96827
3.2566 0.96827
3.3018 0.95939
3.5439 0.94924
3.5442 0.94924
3.6045 0.94924
3.6646 0.93147
3.6649 0.93147
3.7252 0.93020
3.7552 0.92005
3.8006 0.91878
3.8307 0.91244
3.8760 0.90863
4.1183 0.90482
4.1184 0.90482
4.7091 0.90102
4.7847 0.89467
4.8753 0.88452
5.1480 0.88452
5.1933 0.87817
5.3145 0.87817
5.4356 0.87437
5.4957 0.85787
5.5260 0.85787
5.5406 0.84010
5.6005 0.81726
5.6012 0.81726
5.6455 0.80203
6.1455 0.80076
6.1604 0.79442
6.4635 0.79442
6.4784 0.78807
7.2510 0.78299
7.2512 0.78299
7.2956 0.75635
7.3116 0.75635
7.3555 0.73223
7.3563 0.73223
7.4313 0.73223
7.4461 0.72081
7.4909 0.70051
7.5067 0.70051
7.6270 0.69036
7.6273 0.69036
8.4603 0.69036
8.4903 0.68147
8.8994 0.68147
8.9446 0.67259
8.9749 0.67259
9.0049 0.66244
9.0958 0.66244
9.1558 0.64086
9.2308 0.61802
9.2316 0.61802
9.3369 0.61675
9.3663 0.58883
10.4116 0.58122
10.4269 0.55000
10.8509 0.55000
10.9264 0.54800
11.0622 0.54800
11.0931 0.54700
11.1834 0.54700
11.2287 0.54695
12.7287 0.54569
12.7588 0.53934
12.8799 0.53807
12.9251 0.52919
12.9857 0.52792
13.0004 0.51396
14.6823 0.51396
14.7275 0.50635
14.8335 0.50635
14.8400 0.50000
14.9089 0.49492
15.4996 0.48604
16.5147 0.48604
16.5898 0.46320
17.8170 0.46066
17.8471 0.45431
17.9380 0.45305
17.9679 0.44162
18.3770 0.44162
18.3916 0.41000
18.7098 0.41000
18.7246 0.40900
20.2397 0.40863
20.2544 0.39594
20.7994 0.37944
20.7998 0.37944
21.9806 0.36041
21.9812 0.36041
22.1169 0.35787
22.1314 0.33503
24.9344 0.33376
24.9774 0.25000
25.9320 0.25000

4.3.3 Number-at-Risk Data (nRisk)

Show R Code
# Load number-at-risk data
nRisk <- fread(nrisk_file) 

nRisk_table <- 
  nRisk %>%
  gt() %>%
  tab_header(
    title = html("<b>Preview of Number-at-Risk Data</b>")
  ) %>%
  cols_align(align = "center", columns = everything()) %>%
  tab_style(
    style = cell_text(
      weight = "bold",
      style = "italic",
      color = "#2C3E50",     # Dark slate blue-gray tone
      font = "Arial"
    ),
    locations = cells_column_labels(columns = everything())
  )

# Render inside scrollable div
htmltools::div(
  style = "overflow-x: auto; height: 400px;",
  nRisk_table
)
Preview of Number-at-Risk Data
Time Nrisk Median Survival CI Number of Events Unit
0 192 14.8 (10.4,18.4) 102 month
4 160 NA NA
8 116 NA NA
12 88 NA NA
16 63 NA NA
20 30 NA NA
24 5 NA NA
28 0 NA NA
Show R Code
# Load number-at-risk data and remove rows where Nrisk is zero (done for analysis purposes)
nRisk <- fread(nrisk_file) %>%
  filter(Nrisk != 0)

Note: These values should align with the number-at-risk figures presented at the bottom of the Kaplan-Meier plot (Figure S1).

4.4 Reconstruction of IPD

4.4.1 Overview of Reconstruction Algorithm

To recreate individual patient-level data from the digitized Kaplan-Meier curve, we implemented the algorithm from Guyot et al., 2012. This method enables survival analysis when only aggregate survival curves and number-at-risk tables are available.

4.4.1.1 High-Level Workflow and Function Roles

The reconstruction process is modular, with each function handling a specific step:

  1. Align Number-at-Risk Data to KM Curve — Nrisk_low_up()
    • Matches the number-at-risk time points to the digitized survival curve
    • Ensures proper interval mapping for event/censor distribution
  2. Core Reconstruction Logic — reconKMGuyot()
    • Iteratively estimates the number of events and censoring within each interval
    • Reconstructs synthetic event and censoring times for individual patients
    • Outputs a patient-level dataset ready for survival analysis
  3. Flexible Wrapper for Reconstruction — FUN_KM_RECON()
    • Coordinates the overall process
    • Handles input flexibility (with or without number-at-risk table)
    • Calls Nrisk_low_up() and reconKMGuyot() in sequence
    • Returns the reconstructed IPD
  4. Pipeline Execution — process_KM_data()
    • Streamlines function calls
    • Prepares the dataset for downstream survival analysis

4.4.1.2 Summary

Together, these functions produce a synthetic individual patient dataset containing event times and status (event or censor). This dataset can then be used for survival modeling, matching-adjusted indirect comparisons, or other analyses requiring patient-level data.

4.4.2 Helper Function: Align Number-at-Risk Table (Nrisk_low_up())

This function aligns number-at-risk time points with the digitized survival curve, ensuring accurate distribution of events and censored observations.

👇 Expand the code chunk below to view the function definition:

Show R Code
Nrisk_low_up <- function(
  t.risk0,   # Vector of original time points from the number-at-risk table
  t.S1,      # Vector of time points from the digitized survival (KM) curve
  n.risk0    # Vector of number-at-risk counts at each t.risk0 time point
) {
  
  # Iterate over each number-at-risk interval (excluding first and last) 
  for (i in 2:(length(t.risk0) - 1)) {
    # Identify survival curve time points (t.S1) that fall within the current at-risk interval
    temp <- which((t.risk0[i] <= t.S1) & (t.S1 < t.risk0[i + 1]))
    
    # If no points fall within this interval, mark the at-risk time and count as invalid (-1)
    if (length(temp) == 0) {
      t.risk0[i] <- -1
      n.risk0[i] <- -1
      print(paste0("No events/points found between ", t.risk0[i], " and ", t.risk0[i + 1]))
    }
  }

  # Remove any invalid (-1) time points and counts
  t.risk1 <- t.risk0[t.risk0 != -1]
  n.risk1 <- n.risk0[n.risk0 != -1]

  # For each valid at-risk interval, find the corresponding indices (lower and upper) in t.S1
  lower1 <- sapply(1:(length(t.risk1) - 1), function(i) min(which((t.risk1[i] <= t.S1) & (t.S1 < t.risk1[i + 1]))))
  upper1 <- sapply(1:(length(t.risk1) - 1), function(i) max(which((t.risk1[i] <= t.S1) & (t.S1 < t.risk1[i + 1]))))

  # Handle the last interval separately to ensure coverage to the end of t.S1
  lower1 <- c(lower1, min(which(t.risk1[length(t.risk1)] <= t.S1)))
  upper1 <- c(upper1, max(which(t.risk1[length(t.risk1)] <= t.S1)))

  # Return a list of cleaned and matched at-risk times and their index mappings
  list(
    "lower1" = lower1,  # Lower index in t.S1 for each at-risk interval
    "upper1" = upper1,  # Upper index in t.S1 for each at-risk interval
    "t.risk1" = t.risk1,  # Cleaned vector of valid at-risk time points
    "n.risk1" = n.risk1   # Cleaned vector of valid number-at-risk counts
  )
}

4.4.3 Core Reconstruction Function (reconKMGuyot())

The reconKMGuyot() function performs the core survival reconstruction.

Purpose:

  • Iterates through the number-at-risk intervals
  • Estimates the number of events and censoring between each time point
  • Reconstructs individual patient event and censor times
  • Outputs a synthetic individual patient dataset ready for survival analysis

This is a direct implementation of the algorithm described by Guyot et al. (2012).

👇 Expand the code chunk below to view the function definition:

Show R Code
# Main reconstruction function based on Guyot et al. algorithm
reconKMGuyot <- function(
  tot.events,    # Total number of events reported in the study
  t.S,           # Vector of time points from the digitized KM curve
  S,             # Vector of survival probabilities corresponding to t.S
  t.risk,        # Vector of time points where number-at-risk is reported
  lower,         # Vector of lower indices mapping number-at-risk intervals to t.S
  upper,         # Vector of upper indices mapping number-at-risk intervals to t.S
  n.risk,        # Vector of number-at-risk values at each t.risk time point
  tol            # Tolerance value (typically for convergence, not directly used here)
) {

  
  # Number of risk intervals
  n.int <- length(n.risk)
  # Total number of survival time points
  n.t <- upper[n.int]
  
  # Initialize vectors to store:
  # Estimated number of censorings per interval
  n.censor <- rep(0, n.int - 1)
  # Estimated number at risk at each time point
  n.hat <- rep(n.risk[1] + 1, n.t)
  # Number of censorings at each time point
  cen <- rep(0, n.t)
  # Number of events at each time point
  d <- rep(0, n.t)
  # Estimated Kaplan-Meier survival probability at each time point
  KM.hat <- rep(1, n.t)
  # Track the last time point with an event
  last.i <- rep(1, n.int)
  # Sum of events (for book-keeping, not used here)
  sumdL <- 0
  
  # Start looping through each risk interval to estimate events and censoring
  if (n.int > 1) {
    for (i in 1:(n.int - 1)) {
      
      # Estimate number of censored patients between interval i and i+1
      n.censor[i] <- round(n.risk[i] * S[lower[i + 1]] / S[lower[i]] - n.risk[i + 1])
      
      # Adjust estimated number of censorings iteratively until number at risk aligns
      while ((n.hat[lower[i + 1]] > n.risk[i + 1]) || 
             ((n.hat[lower[i + 1]] < n.risk[i + 1]) && (n.censor[i] > 0))) {
        
        if (n.censor[i] <= 0) {
          # If no censoring, set all censoring counts in this interval to zero
          cen[lower[i]:upper[i]] <- 0
          n.censor[i] <- 0
        } else {
          # Distribute censoring events uniformly across the interval
          cen.t <- t.S[lower[i]] + (1:n.censor[i]) * 
                   (t.S[lower[i + 1]] - t.S[lower[i]]) / (n.censor[i] + 1)
          
          # Bin the censor times into the respective survival time intervals
          cen[lower[i]:upper[i]] <- hist(cen.t, 
                                         breaks = t.S[lower[i]:lower[i + 1]], 
                                         plot = FALSE)$counts
        }
        
        # Set starting number at risk for this interval
        n.hat[lower[i]] <- n.risk[i]
        # Initialize the last time point with event
        last <- last.i[i]
        
        # Iterate through each time point in this interval
        for (k in lower[i]:upper[i]) {
          # Estimate events at time k using conditional survival probability
          d[k] <- if (i == 1 && k == lower[i]) 0 else round(n.hat[k] * (1 - (S[k] / KM.hat[last])))
          
          # Update KM estimate based on new events
          KM.hat[k] <- KM.hat[last] * (1 - (d[k] / n.hat[k]))
          
          # Update number at risk for next time point
          n.hat[k + 1] <- n.hat[k] - d[k] - cen[k]
          
          # Update last event time if an event occurred at k
          if (d[k] != 0) last <- k
        }
        
        # Adjust censoring based on updated number at risk
        n.censor[i] <- n.censor[i] + (n.hat[lower[i + 1]] - n.risk[i + 1])
      }
      
      # Ensure that the estimated number at risk aligns with the known number
      if (n.hat[lower[i + 1]] < n.risk[i + 1]) n.risk[i + 1] <- n.hat[lower[i + 1]]
      
      # Store last event time point for next interval
      last.i[i + 1] <- last
    }
  }
  
  # -------- Reconstruct Individual Patient Data (IPD) ----------
  
  # Initialize IPD time vector (default all to last time point)
  t.IPD <- rep(t.S[n.t], n.risk[1])
  # Initialize event indicator vector (0 = censored by default)
  event.IPD <- rep(0, n.risk[1])
  # Tracker for indexing patients
  k <- 1
  
  # Assign event times to the patients
  for (j in 1:n.t) {
    if (d[j] != 0) {
      t.IPD[k:(k + d[j] - 1)] <- rep(t.S[j], d[j])  # Assign event time
      event.IPD[k:(k + d[j] - 1)] <- rep(1, d[j])   # Mark as event
      k <- k + d[j]  # Move the index forward
    }
  }
  
  # Assign censoring times (placed between survival times)
  for (j in 1:(n.t - 1)) {
    if (cen[j] != 0) {
      # Censoring time is placed mid-interval
      t.IPD[k:(k + cen[j] - 1)] <- rep((t.S[j] + t.S[j + 1]) / 2, cen[j])
      event.IPD[k:(k + cen[j] - 1)] <- rep(0, cen[j])  # Mark as censored
      k <- k + cen[j]
    }
  }
  
  # Create the final IPD dataframe
  IPD <- data.frame(t = t.IPD, ev = event.IPD)
  
  # Final adjustment to ensure the last time aligns with survival curve max
  if (IPD[nrow(IPD), "t"] < t.S[length(t.S)]) {
    IPD[nrow(IPD), "t"] <- t.S[length(t.S)]
  }
  
  # Return both the reconstructed event/censoring table and the IPD
  return(list(dat = cbind(t.S, S, n.hat[1:n.t], d, cen), ipd = IPD))
}

4.4.4 Wrapper Function (FUN_KM_RECON())

The FUN_KM_RECON() function wraps the full reconstruction process. It:

  • Prepares the digitized survival data (rawkmfile)
  • Ensures the time = 0 anchor is present
  • Aligns the number-at-risk table
  • Calls reconKMGuyot() to generate the individual patient data (IPD)

This function provides flexibility to:

  • Run the reconstruction with or without number-at-risk data
  • Use a known total event count (totev) or just the total patient count (totp) if number-at-risk is missing

👇 Expand the code chunk below to view the function definition:

Show R Code
FUN_KM_RECON <- function(
  rawkmfile,      # Digitized Kaplan-Meier survival data (time and probability)
  rawnriskfile,   # Number-at-risk table with time and counts (can be NULL)
  totev,          # Total number of events reported for the study
  totp = 0        # Total number of patients (used if number-at-risk table is missing)
) {
  
  # Rename the digitized KM data columns for clarity: 't' for time and 'p' for survival probability
  colnames(rawkmfile) <- c("t", "p")
  
  # Ensure that time zero is included in the KM data (this anchors the curve at survival probability = 1)
  if (sum(rawkmfile$t == 0) == 0) {
    print("Time zero not present in KM data. Adding time = 0 and survival = 1.")
    new_row <- data.table(t = 0, p = 1)   # If missing, create a new row at time zero with survival = 1
    rawkmfile <- rbind(new_row, rawkmfile) # Add it to the top of the KM dataset
  } else {
    print("Time zero found in KM data. Setting survival at time zero to 1.")
    rawkmfile$p[which(rawkmfile$t == 0)] <- 1  # If time zero exists, make sure survival probability is set to 1
  }
  
  # Check if the number-at-risk table is provided
  if (!is.null(rawnriskfile)) {
    print("Number-at-risk data provided. Proceeding with alignment and full reconstruction.")
    
    # Align the number-at-risk data to the KM data using the helper function
    nrisk_bounds <- Nrisk_low_up(
      t.risk0 = rawnriskfile$Time,  # Times from the number-at-risk table
      t.S1 = rawkmfile$t,           # Times from the digitized KM curve
      n.risk0 = rawnriskfile$Nrisk  # Number-at-risk counts
    )
    
    # Perform the core reconstruction of patient-level data
    ipd_recon <- reconKMGuyot(
      tot.events = totev,                # Total events in the trial
      t.S = rawkmfile$t,                 # KM curve time points
      S = rawkmfile$p,                   # KM survival probabilities
      t.risk = nrisk_bounds$t.risk1,     # Cleaned number-at-risk times
      lower = nrisk_bounds$lower1,       # Lower bounds of survival time mapping
      upper = nrisk_bounds$upper1,       # Upper bounds of survival time mapping
      n.risk = nrisk_bounds$n.risk1,     # Cleaned number-at-risk counts
      tol = 0.01                         # Tolerance for convergence in reconstruction
    )
    print("Reconstruction using number-at-risk table completed.")
    # Return the reconstructed dataset as a list (including survival curve, at-risk table, and IPD)
    return(list(
      surv = rawkmfile,          # The processed KM data
      nrisk = rawnriskfile,      # Original number-at-risk table
      IPD = ipd_recon$ipd        # Reconstructed individual patient data (IPD)
    ))
    
  } else {
    # If no number-at-risk table is provided, reconstruct using only total population count
    print("Number-at-risk data NOT provided. Proceeding with total population count only.")
    
    ipd_recon <- reconKMGuyot(
      tot.events = totev,          # Total events
      t.S = rawkmfile$t,           # KM time points
      S = rawkmfile$p,             # KM survival probabilities
      t.risk = 0,                  # Placeholder since n.risk is not used here
      lower = 1,                   # Starting index for time mapping
      upper = length(rawkmfile$t), # Ending index
      n.risk = totp,               # Total number of patients (if no risk table)
      tol = 0.01                   # Tolerance for reconstruction
    )
    print("Reconstruction using total patient count completed.")
    # Return the reconstructed dataset (same structure)
    return(list(
      surv = rawkmfile,
      nrisk = rawnriskfile,
      IPD = ipd_recon$ipd
    ))
  }
}

4.4.5 Pipeline Function (process_KM_data())

The process_KM_data() function provides a clean interface to run the reconstruction workflow.

It:

  • Accepts the digitized KM data (digC) and the number-at-risk table (nRisk)
  • Passes these along with the reported number of events to FUN_KM_RECON()
  • Returns the reconstructed individual patient-level dataset (IPD)

This function simplifies execution and keeps the main script readable.

👇 Expand the code chunk below to view the function definition:

Show R Code
process_KM_data <- function(
  digC,   # Digitized Kaplan-Meier curve data (data frame with time and survival probability)
  nRisk   # Number-at-risk table (data frame with time points, at-risk counts, and event count)
) {
  
  # Reconstruct individual patient data (IPD) using the wrapper function FUN_KM_RECON
  IPD_result <- FUN_KM_RECON(
    rawkmfile = digC,                 # Pass digitized KM data
    rawnriskfile = nRisk,             # Pass number-at-risk table
    totev = nRisk$`Number of Events`[1]  # Use the total event count from the first row of nRisk table
  )
  
  # Return the reconstructed IPD as a list (only the individual patient data portion)
  return(list(IPD = IPD_result$IPD))
}

4.4.6 Execute Reconstruction and Preview IPD

We now execute the reconstruction process by applying the pipeline function process_KM_data() to the digitized Kaplan-Meier curve data (digC) and the number-at-risk table (nRisk).

The output is a dataset where each row represents a synthetic individual patient with their corresponding event time and event status:

1 = event occurred

0 = censored observation

👇 Expand the code chunk below to view the function definition:

Show R Code
# Run the reconstruction pipeline
result <- process_KM_data(digC, nRisk)
[1] "Time zero found in KM data. Setting survival at time zero to 1."
[1] "Number-at-risk data provided. Proceeding with alignment and full reconstruction."
[1] "Reconstruction using number-at-risk table completed."
Show R Code
# Extract the reconstructed individual patient dataset
reconstructed_ipd <- result$IPD
reconstructed_ipd <- reconstructed_ipd |> mutate(t = as.numeric(t)) |> arrange(t)

The reconstructed dataset contains 192 individual records.

Show R Code
# Sample gt table
reconstructed_ipd_table <- 
  reconstructed_ipd |> 
  gt() %>%
  tab_header(
    title = html("<b>Preview of Reconstructed IPD</b>")
             )%>%
  cols_align(align = "center", columns = everything()) %>%
  tab_style(
    style = cell_text(
      weight = "bold",
      style = "italic",
      color = "#2C3E50",     # Dark slate blue-gray tone
      font = "Arial"
    ),
    locations = cells_column_labels(columns = everything())
  )

# Wrap gt table in a scrollable div
htmltools::div(
  reconstructed_ipd_table, 
  style = "overflow-x: auto; max-height: 400px;"
  )
Preview of Reconstructed IPD
t ev
0.38640 0
0.38640 0
0.38640 0
0.80290 1
1.03020 0
1.03020 0
1.30270 1
1.52235 0
1.52235 0
1.83270 1
1.83270 1
2.18120 0
2.18120 0
2.18120 0
2.54460 1
2.63540 1
2.94600 0
2.94600 0
2.94600 0
3.30180 1
3.42285 0
3.54390 1
3.54390 1
3.63455 0
3.66460 1
3.66460 1
3.66460 1
3.72520 1
3.75520 1
3.80060 1
3.83070 1
3.99715 0
4.11830 1
4.70910 1
4.74690 0
4.78470 1
4.87530 1
4.87530 1
5.19330 1
5.37505 0
5.49570 1
5.49570 1
5.49570 1
5.54060 1
5.54060 1
5.54060 1
5.60050 1
5.60050 1
5.60050 1
5.60050 1
5.64550 1
5.64550 1
5.64550 1
5.89550 0
6.16040 1
6.47840 1
6.86470 0
6.86470 0
7.25100 1
7.29560 1
7.29560 1
7.29560 1
7.29560 1
7.29560 1
7.35550 1
7.35550 1
7.35550 1
7.35550 1
7.44610 1
7.44610 1
7.49090 1
7.49090 1
7.49090 1
7.62700 1
7.62700 1
8.04380 0
8.49030 1
8.94460 1
8.94460 1
9.00490 1
9.00490 1
9.15580 1
9.15580 1
9.15580 1
9.23080 1
9.23080 1
9.23080 1
9.23080 1
9.28425 0
9.36630 1
9.36630 1
9.36630 1
9.36630 1
9.36630 1
9.88895 0
10.41160 1
10.42690 1
10.42690 1
10.42690 1
10.42690 1
10.42690 1
10.92640 1
10.99430 0
11.97870 0
12.75880 1
12.92510 1
12.92510 1
12.95540 0
13.00040 1
13.00040 1
13.84135 0
13.84135 0
13.84135 0
13.84135 0
13.84135 0
13.84135 0
13.84135 0
14.72750 1
14.78050 0
14.84000 1
14.90890 1
15.20425 0
15.20425 0
15.20425 0
15.49960 1
16.00715 0
16.00715 0
16.00715 0
16.00715 0
16.58980 1
16.58980 1
16.58980 1
17.20340 0
17.20340 0
17.20340 0
17.20340 0
17.20340 0
17.20340 0
17.20340 0
17.20340 0
17.84710 1
17.89255 0
17.96790 1
18.17245 0
18.17245 0
18.17245 0
18.39160 1
18.39160 1
18.39160 1
18.39160 1
18.55070 0
18.55070 0
19.48215 0
19.48215 0
19.48215 0
19.48215 0
19.48215 0
19.48215 0
19.48215 0
19.48215 0
19.48215 0
19.48215 0
20.25440 1
20.52690 0
20.52690 0
20.79940 1
21.39020 0
21.39020 0
21.39020 0
21.39020 0
21.39020 0
21.39020 0
21.98060 1
22.13140 1
23.53290 0
23.53290 0
23.53290 0
23.53290 0
23.53290 0
23.53290 0
23.53290 0
23.53290 0
23.53290 0
23.53290 0
23.53290 0
23.53290 0
23.53290 0
25.93200 0
25.93200 0
25.93200 0
25.93200 0
25.93200 0

4.5 Quality Control and Manual Inspection of Reconstructed Data

While the reconstruction algorithm provides a powerful tool to recreate individual patient data from published Kaplan-Meier curves, manual review and quality control are critical to ensure the reconstructed data aligns with the published study’s reported metrics.

In practice, reconstructed outputs may not perfectly replicate published numbers-at-risk or median survival estimates due to limitations in the digitization process or algorithm assumptions. Manual adjustments help reconcile these discrepancies and improve fidelity.

4.5.1 Examples of Manual Modifications

In this example, we compared the reconstructed IPD output to the publication and made specific adjustments to better match the reported numbers-at-risk and median progression-free survival. Details of these modifications are documented in the file “Reconstructed_IPD_reconciliation.xlsx” (provided for reference, not used in analyses).

🔧 Adjustments to Censoring and Numbers-at-Risk

  • Month 8: The reconstructed number-at-risk exceeded the published value by 1. We corrected this by censoring one additional patient before 8 months (adjusted censoring from 8.04 to 7.9 months).
  • Month 16: The reconstructed number-at-risk exceeded the publication by 4. We shifted four censoring events that originally occurred between 16–20 months to times before 16 months.

🔧 Adjustments to Align Median PFS

  • The reported median PFS was 14.8 months. Our initial reconstruction estimated 14.9 months.
  • To align with the published value, we adjusted two event times from 14.84 and 14.91 months to occur precisely at 14.8 months.

4.5.2 Load the Modified Reconstructed Dataset (Post-QC)

Warning

Manual adjustments were made to improve alignment with the published data. These are documented in Reconstructed_IPD_reconciliation.xlsx.

The final version of the reconstructed IPD dataset—incorporating these manual adjustments—is loaded below for transparency:

Show R Code
data_path <- here::here("files/ENCO+BINI PFS Digitization Example/")
file_name <- "Reconstructed_IPD (modified).csv"
file_path <- paste0(data_path, file_name)
#file.exists(file_path)
# Step 1: Load the dataset
modified_reconstructed_ipd <- read_csv(file = file_path)
modified_reconstructed_ipd_table <- modified_reconstructed_ipd |> 
  mutate(t = as.numeric(t)) |> arrange(t) |> 
  gt() %>%
  tab_header(
    title = html("<b>Preview of Reconstructed IPD</b>")
    ) %>%
  cols_align(align = "center", columns = everything()) %>%
  tab_style(
    style = cell_text(
      weight = "bold",
      style = "italic",
      color = "#2C3E50",     # Dark slate blue-gray tone
      font = "Arial"
    ),
    locations = cells_column_labels(columns = everything())
  )

# Wrap gt table in a scrollable div
htmltools::div(modified_reconstructed_ipd_table, style = "overflow-x: auto; max-height: 400px;")
Preview of Reconstructed IPD
t ev
0.38640 0
0.38640 0
0.38640 0
0.80290 1
1.03020 0
1.03020 0
1.30270 1
1.52235 0
1.52235 0
1.83270 1
1.83270 1
2.18120 0
2.18120 0
2.18120 0
2.54460 1
2.63540 1
2.94600 0
2.94600 0
2.94600 0
3.30180 1
3.42285 0
3.54390 1
3.54390 1
3.63455 0
3.66460 1
3.66460 1
3.66460 1
3.72520 1
3.75520 1
3.80060 1
3.83070 1
3.99715 0
4.11830 1
4.70910 1
4.74690 0
4.78470 1
4.87530 1
4.87530 1
5.19330 1
5.37505 0
5.49570 1
5.49570 1
5.49570 1
5.54060 1
5.54060 1
5.54060 1
5.60050 1
5.60050 1
5.60050 1
5.60050 1
5.64550 1
5.64550 1
5.64550 1
5.89550 0
6.16040 1
6.47840 1
6.86470 0
6.86470 0
7.25100 1
7.29560 1
7.29560 1
7.29560 1
7.29560 1
7.29560 1
7.35550 1
7.35550 1
7.35550 1
7.35550 1
7.44610 1
7.44610 1
7.49090 1
7.49090 1
7.49090 1
7.62700 1
7.62700 1
7.90000 0
8.49030 1
8.94460 1
8.94460 1
9.00490 1
9.00490 1
9.15580 1
9.15580 1
9.15580 1
9.23080 1
9.23080 1
9.23080 1
9.23080 1
9.28425 0
9.36630 1
9.36630 1
9.36630 1
9.36630 1
9.36630 1
9.88895 0
10.41160 1
10.42690 1
10.42690 1
10.42690 1
10.42690 1
10.42690 1
10.92640 1
10.99430 0
11.97870 0
12.75880 1
12.92510 1
12.92510 1
12.95540 0
13.00040 1
13.00040 1
13.84135 0
13.84135 0
13.84135 0
13.84135 0
13.84135 0
13.84135 0
13.84135 0
14.72750 1
14.78050 0
14.80000 1
14.80000 1
15.20425 0
15.20425 0
15.20425 0
15.49960 1
15.90000 0
15.90000 0
15.90000 0
15.90000 0
16.00715 0
16.00715 0
16.00715 0
16.00715 0
16.58980 1
16.58980 1
16.58980 1
17.20340 0
17.20340 0
17.20340 0
17.20340 0
17.20340 0
17.20340 0
17.20340 0
17.84710 1
17.89255 0
17.96790 1
18.39160 1
18.39160 1
18.39160 1
18.39160 1
18.55070 0
18.55070 0
19.48215 0
19.48215 0
19.48215 0
19.48215 0
19.48215 0
19.48215 0
19.48215 0
19.48215 0
19.48215 0
19.48215 0
20.25440 1
20.52690 0
20.52690 0
20.79940 1
21.39020 0
21.39020 0
21.39020 0
21.39020 0
21.39020 0
21.39020 0
21.98060 1
22.13140 1
23.53290 0
23.53290 0
23.53290 0
23.53290 0
23.53290 0
23.53290 0
23.53290 0
23.53290 0
23.53290 0
23.53290 0
23.53290 0
23.53290 0
23.53290 0
24.97740 1
25.93200 0
25.93200 0
25.93200 0
25.93200 0

4.6 Visualization: Reconstructed Kaplan-Meier Survival Curve

Using the reconstructed and modified individual patient data, we fit a Kaplan-Meier survival curve and plot it below. This provides a direct comparison to the original published Figure S1.

Show R Code
# Create a survival object
surv_obj <- Surv(time = modified_reconstructed_ipd$t, event = modified_reconstructed_ipd$ev)

# Fit the Kaplan-Meier model
km_fit <- survfit(surv_obj ~ 1, data = modified_reconstructed_ipd)

# Plot the survival curve with the risk table
ggsurvplot(
  km_fit,
  data = reconstructed_ipd,
  risk.table = TRUE,               # Show number at risk table
  risk.table.y.text = TRUE,        # Add labels to risk table rows
  risk.table.height = 0.2,         # Adjust risk table size
  break.time.by = 4,               # X-axis breaks every 4 months
  xlab = "Time (Months)",
  ylab = "Survival Probability",
  surv.median.line = "hv",         # Add horizontal/vertical line at median survival
  ggtheme = theme_minimal(),
  legend = "none"
)

Show R Code
# Extract and print median survival time
median_time <- summary(km_fit)$table["median"]
cat("**Reconstructed Median PFS:**", round(median_time, 2), "months\n")
**Reconstructed Median PFS:** 14.8 months

5 Worked MAIC Example

5.1 Overview and Objectives

Having completed the reconstruction and validation of the pseudo-IPD from the published Kaplan-Meier curve, we now shift to the core application of these data: conducting a Matching-Adjusted Indirect Comparison.

The following worked example demonstrates, step-by-step, how to:

✅ Simulate time-to-event outcomes for the NIVO+RELA pseudo-cohort

✅ Estimate patient-level weights to adjust for baseline differences

✅ Compare outcomes against the comparator BRAF/MEK inhibitor cohort (DAB+TRAM)

This section mirrors the analytical framework used in our study and provides fully reproducible R code.

5.2 Simulate Time-To-Event Outcomes for Relativity 047 IPD

Next, we simulate the time-to-event outcome. Unlike the comparator datasets, where individual-level survival data were reconstructed from published Kaplan-Meier curves, we cannot access the actual patient-level data from the RELATIVITY-047 trial. Due to confidentiality and patient identification concerns, the dataset is not publicly available.

Additionally, digitizing the published RELATIVITY-047 survival curves is not feasible for this example because our analysis focuses specifically on the BRAF-mutant patient subset, which is not separately plotted or reported in the published Kaplan-Meier figures.

As a result, for the purpose of this worked example, we simulate the time-to-event outcome for the NIVO+RELA cohort. The simulated data serve as a stand-in for the actual patient-level data and allow us to demonstrate the MAIC process. Time-to-event outcomes are represented by two variables: a numeric time to event or censoring, and a binary event indicator. For simplicity, we use an exponential distribution to generate the event times.

Show R Code
### Time-to-event outcomes (OS) ##
# Simulating patient-level data for OS
# Define hazard rate lambda (using a high survival time to avoid reaching the median)
assumed_os <- 60  # Assumes a high survival time (in months) to ensure that the median survival time is not reached for most patients
lambda <- log(2) / assumed_os  # This defines the hazard rate for OS, assuming exponential survival; here we have a low event rate

# Generate event times from an exponential distribution
event_time <- rexp(N_patients_NR, rate = lambda) # Draws survival times from an exponential distribution using the previously defined hazard rate. This generates random survival durations for each patient.

# Generate censoring times from an exponential distribution with a smaller rate
lambda_censor <- lambda / 20  # The censoring hazard rate is 20 times smaller than the event hazard rate. This ensures more censoring than actual events.
censor_time <- rexp(N_patients_NR, rate = lambda_censor) # simulates censoring times.

# Ensure maximum time does not exceed 63.5
event_time[event_time > 63.5] <- 63.5 # Caps survival and censoring times at 63.5 months to avoid extreme outliers and ensure realistic follow-up durations.
censor_time[censor_time > 63.5] <- 63.5

# Ensure exactly 58 events by sorting times and censoring the rest
observed_time <- pmin(event_time, censor_time) # Observed time is the minimum of event or censoring time (pmin(event_time, censor_time)).
event_indicator <- ifelse(rank(observed_time) <= 58, 1, 0)  # 1 = event, 0 = censored

# Create patient-level OS dataset
patient_data <- data.frame(
  ID = 001:N_patients_NR,
  OS_Time_Months = observed_time,
  OS_Event = event_indicator  # 1 = event, 0 = censored
)

# Check final event count (for visualization purposes)
cat("Final Event Count:", sum(patient_data$OS_Event), "\n")  # Should be 58
Final Event Count: 58 
Show R Code
# Fit Kaplan-Meier survival curve
surv_obj <- Surv(time = patient_data$OS_Time_Months, event = patient_data$OS_Event)
km_fit <- survfit(surv_obj ~ 1)

# Plot survival curve
ggsurvplot(
  km_fit,
  data = patient_data,
  conf.int = TRUE,
  xlim = c(0, 64.5),
  break.time.by = 10,
  risk.table = TRUE,
  risk.table.title = "",                    # ✅ Removes "Number at risk"
  xlab = "Time (Months)",
  ylab = "Survival Probability",
  title = "Overall Survival using Simulated Individual Patient Data",
  subtitle = "Kaplan-Meier Curve",
  legend = "none",
  ggtheme = theme_minimal(base_size = 16) +
    theme(
      plot.title = element_text(hjust = 0.5, size = 18, face = "bold", color = "black"),
      plot.subtitle = element_text(hjust = 0.5, size = 16, color = "black")
    ),
  risk.table.y.text.col = TRUE,
  risk.table.y.text = FALSE,
  risk.table.height = 0.25,
  font.x = c(16, "plain", "black"),
  font.y = c(16, "plain", "black"),
  font.tickslab = c(14, "plain", "black")
)

Show R Code
## Merge time-to-event outcome with rest of IPD ##
NR.IPD <- NR.IPD %>% mutate(ID = as.numeric(ID)) %>% left_join(patient_data)

5.3 Load Comparator Pseudo-IPD: DAB+TRAM

For the DAB+TRAM cohort, we rely on summary statistics for baseline characteristics and binary outcomes reported in the trial publications. The time-to-event data (overall survival) have already been reconstructed using the Kaplan-Meier curve digitization and reconstruction approach described earlier in this document.

Specifically, the individual patient-level survival data for DAB+TRAM were generated from the published Kaplan-Meier curves using the Guyot et al. (2012) algorithm. This reconstruction process transformed the digitized survival probabilities and number-at-risk tables into a synthetic dataset approximating the original trial results.

For this worked example, we load the resulting pseudo-IPD directly from a CSV file containing the reconstructed event times and censoring indicators for each individual.

Show R Code
# Summary level data for comparator, DAB+TRAM
N_DT <- 563  # Total number of patients
## Baseline characteristics (frequencies) ##
age_above_55_DT <- 0.5
female_pc_DT <- 0.43339254
ecog_0_pc_DT <- 0.715808
ldh_great_uln_pc_DT <- 0.344583
disease_sites_above3_pc_DT <- 0.488455
mets_M0_M1a_pc_DT <- 0.1669627
mets_M1b_pc_DT <- 0.186500888
immunotherapy_DT <- 0.207815
# Outcomes
ORR_investigator_DT <- 0.680284  # Objective Response Rate (investigator assessed)

# Read in comparator pseudo-IPD from CSV file provided
data_path <- here::here("files/dab tram/")
file_name <- "Pseudo IPD DabrTram.csv"
file_path <- paste0(data_path, file_name)
#file.exists(file_path)
Pseudo.DT <- read.csv(file_path) %>% 
  mutate(wt = 1, trt = "DAB+TRAM") 

# Create aggregate dataset for comparator (using frequencies, as above)
Agg.DT <- data.frame(
  n = N_DT,
  age_above_55 = age_above_55_DT,
  gender_female = female_pc_DT,
  ecog_0 = ecog_0_pc_DT,
  ldh_great_uln = ldh_great_uln_pc_DT,
  disease_sites_above3 = disease_sites_above3_pc_DT,
  mets_M0_M1a = mets_M0_M1a_pc_DT,
  immunotherapy = immunotherapy_DT,
  ORR_investigator = ORR_investigator_DT
  )

5.4 Matching-Adjusted Indirect Comparisons

The approach presented here follows the appendix from Phillippo et al. (2016), and the text has been adapted to our example. MAIC is a technique that adjusts for differences in baseline characteristics between two treatment groups when direct head-to-head trials are unavailable.

5.4.1 Estimating Weights for Each Patient

To ensure that our NIVO+RELA cohort better reflects the characteristics of the DAB+TRAM cohort, we assign each patient a weight that reflects how well their baseline characteristics align with the comparator group. This weight is determined using a a logistic propensity score model, a statistical model similar to a logistic regression.

The model assigns weights based on key patient characteristics (such as age, sex, ECOG status, and LDH levels) so that, after weighting, the average baseline characteristics of the NIVO+RELA group match those of the DAB+TRAM group. The weighting model is defined as follows:
\[ \log(w_{i}) = \alpha_0 + \alpha_1^T \mathbf{X}_{i} \] where:

  • \(w_{i}\) is the weight assigned to each patient \(i\) in the NIVO+RELA group
  • \(X_{i}\) represents the patient’s baseline characteristics
  • \(α\) is a set of parameters that are estimated to achieve balance between groups

5.4.2 Conceptual Explanation of the Weighting Approach

  • \(X_{i}\) is a vector of patient characteristics, such as age and sex.
  • \(\alpha_1^T \mathbf{X}_{i}\) means each characteristic is multiplied by a weight (\(α\)) to determine its importance in the adjustment process.
  • Since the model is expressed in terms of the log of the weights, the actual weight values are obtained later by exponentiating the final result (see below). This ensures all weights remain positive.

Since our goal is to match the distributions of these characteristics across cohorts, we solve for the best values of \(α\) using an optimization technique (BFGS method). This ensures that after weighting, the adjusted NIVO+RELA group is statistically similar to DAB+TRAM in terms of baseline characteristics.

The weights are estimated using the method of moments to match the distributions of the baseline characteristics between cohorts. This problem is equivalent to minimizing the following objective function:
\[ \sum_{i} \exp(\alpha_1^T \mathbf{X}_{i}) \] under the contsraint that:
\[\bar{\mathbf{X}}_{(DAB+TRAM)} = 0\]

This means that, after weighting, the average values of the baseline characteristics in the NIVO+RELA group must match those in the DAB+TRAM group.

To achieve this, we center the covariate values by subtracting the mean baseline characteristics of the DAB+TRAM group from the patient data in both trials. This ensures that the two groups are aligned.

The adjustment parameters \(α_{1}\) are then estimated using a numerical optimization method called BFGS (named after its developers: Broyden, Fletcher, Goldfarb, and Shanno). We do not need to estimate \(α_{0}\) , as this term cancels out in the calculations.

Once these adjustments are applied, the estimated weights for each patient are calculated as: \[ \hat{w}_{i} = \exp\left( \bar{\mathbf{X}}_{i} \hat{\alpha}_1 \right). \] Where:

  • \(\hat{w}_{i}\) (“w hat sub i”) represents the final weight assigned to patient. The hat notation (\(^\)) indicates that this is an estimated value, meaning it has been derived from the optimization process.
  • \(\bar{\mathbf{X}}_{i}\) represents the centered version of patient \(i\)’s baseline characteristics. The bar notation (\(ˉ\)) means that we have subtracted the mean values of the DAB+TRAM group from the patient’s original characteristics.
  • \(\hat{\alpha}_1\) is a vector of estimated adjustment parameters. Each element of \(\hat{\alpha}_1\) tells us how strongly a given characteristic (e.g., age, sex, ECOG status) influences the weighting.
  • As stated above, we use the exp() function since the model was built using the log of the weights, we take the exponential function to recover the actual weight values.

5.4.3 Interpreting the Weights

Once we estimate the weights, we can rescale them to better understand how individual patients contribute to the adjusted comparison. The

\[ \tilde{w}_{i} = \frac{\hat{w}_{i}}{\sum_{i} \hat{w}_{i}} \cdot N_{(NIVO+RELA)} \]

  • A rescaled weight >1 means a patient is overrepresented in the weighted analysis.
  • A rescaled weight <1 means a patient is underrepresented compared to the original cohort.
  • A weight close to 1 means the patient’s influence remains largely unchanged.

This method ensures that after weighting, the distribution of key baseline characteristics in NIVO+RELA matches that of DAB+TRAM, allowing for a more fair and unbiased comparison.

Below we create the weights function:

Show R Code
#Function to generate weights (one arm at a time)
 weights <- function(ipd, sld, continuous_vars, binary_vars) {
   ## Explanation of the arguments:
   ###   ipd → The individual patient data (NIVO+RELA group).
   ###   sld → The summary-level data (baseline characteristics of DAB+TRAM).
   ###   continuous_vars → A list of continuous variables (e.g., age, lab values).
   ###   binary_vars → A list of binary variables (e.g., sex, ECOG status).
   
   
   # Define objective function for use in optimization process
   ## Notes: This function computes the sum of exponentiated weighted patient characteristics to finds the best 
   ##        α1 values that ensure the weighted NIVO+RELA population matches the DAB+TRAM cohort
   objfn <- function(a1, X) {
     sum(exp(X %*% a1))
   }
   
   # Define gradient function for use in optimization process
   gradfn <- function(a1, X) {
     colSums(sweep(X, 1, exp(X %*% a1), "*"))
   }
   
   # Specify an empty list to store the variables
   variables <- list()
   
   # Extract the binary variables if not NULL
   if (!is.null(binary_vars)) {
     for (var in binary_vars) {

       binary <- ipd[[var]]
       variables <- c(variables, list(binary))
     }
   }
   
   # Combine all variables into a single matrix
   X.EM.0 <- do.call(cbind, variables)
   
   # Prepare the means for centering
   means <- c()
   if (!is.null(binary_vars)) {
       means <- c(means, unlist(sld[paste0(binary_vars)]))
   }
   
   # Center the variables
   X.EM.0 <- sweep(X.EM.0, 2, means, '-')
   
   # Optimization step to estimate alpha
   ## Notes: Initial value for "par" was selected as a vector of zeros
   ##        X is passed to both objfn and gradfn as an additional argument
   opt1 <- optim(par = rep(0, ncol(X.EM.0)), fn = objfn, gr = gradfn, X = X.EM.0, 
                 method = "BFGS",  control = list(maxit = 50000))
   validate(need(opt1$convergence==0, message = "Algorithm fails to converge "))
   a1 <- opt1$par
   wt <- exp(X.EM.0 %*% a1)   # Calculate weights using estimated alpha parameters
   wt.rs <- (wt / sum(wt)) * nrow(ipd)   # Rescale weights
   
   # Combine pid, wt.rs, and wt into a data frame 
   # Note the order of wt and wt.rs is the same as ipd since no sorting is done
   result <- data.frame(ID = ipd$ID, wt.rs = wt.rs, wt =wt) 
   return(result)
 }

5.4.4 Variable matching

After defining the weighting function, the next step is to specify which baseline characteristics will be used to match the two cohorts (NIVO+RELA vs. DAB+TRAM).

Show R Code
#Set variables for matching
continuous_vars<- c() # No continuous variables used in this example, so we have an empty list
binary_vars <- c("age_above_55","gender_female","ecog_0","ldh_great_uln",
                 "disease_sites_above3","mets_M0_M1a","immunotherapy")
# age_above_55 → Whether a patient is older than 55 (0 = No, 1 = Yes).
# gender_female → Whether the patient is female (0 = Male, 1 = Female).
# ecog_0 → Whether the patient has an ECOG performance status of 0 (0 = ECOG 1+, 1 = ECOG 0).
# ldh_great_uln → Whether the patient has LDH above the upper limit of normal (ULN) (0 = No, 1 = Yes).
# disease_sites_above3 → Whether the patient has ≥3 sites of disease (0 = <3 sites, 1 = 3+ sites).
# mets_M0_M1a → Whether the patient has M0 or M1a metastatic status (0 = No, 1 = Yes).
# immunotherapy → Whether the patient received prior immunotherapy (0 = No, 1 = Yes).

5.4.5 Generate Weights for Nivo + Rela

Show R Code
# Call weight function to generate (unanchored) weights
weights_unanchored_ <- weights(ipd = NR.IPD, sld = Agg.DT, continuous_vars, binary_vars)
# Generate the GT table for patient-level unanchored weights
weights_unanchored_table <- weights_unanchored_ %>%
  head() %>%
  gt() %>%
  tab_header(
    title = html("<b>Patient-Level Unanchored Weights from MAIC for NIVO+RELA</b>")
  ) %>%
  cols_align(align = "center", columns = everything())

# Render inside a scrollable HTML div
htmltools::div(
  style = "overflow-x: auto; height: 400px;",
  weights_unanchored_table
)
Patient-Level Unanchored Weights from MAIC for NIVO+RELA
ID wt.rs wt
1 1.1786523 1.0388435
2 0.7099735 0.6257582
3 0.5214058 0.4595580
4 0.3283765 0.2894253
5 0.9709276 0.8557586
6 0.4929441 0.4344723

5.4.6 Merge Weights with Baseline and Outcome Data

Show R Code
# Merge the generated weights with the baseline and outcome data
weights_unanchored <- full_join(weights_unanchored_, NR.IPD, by = "ID")

# Create the GT table to preview merged data
weights_unanchored_table <- weights_unanchored %>%
  head() %>%
  gt() %>%
  tab_header(
    title = html("<b>Patient-Level Unanchored Weights Merged with Baseline and Outcome Data</b>")
  ) %>%
  cols_align(align = "center", columns = everything())

# Render inside a scrollable HTML div
htmltools::div(
  style = "overflow-x: auto; height: 400px;",
  weights_unanchored_table
)
Patient-Level Unanchored Weights Merged with Baseline and Outcome Data
ID wt.rs wt n age_above_55 gender_female ecog_0 ldh_great_uln disease_sites_above3 mets_M0_M1a immunotherapy ORR_investigator trt OS_Time_Months OS_Event
1 1.1786523 1.0388435 136 1 1 1 0 0 0 1 0 NIVO+RELA 63.5000000 0
2 0.7099735 0.6257582 136 1 0 1 0 1 1 0 1 NIVO+RELA 33.8213046 1
3 0.5214058 0.4595580 136 0 1 0 0 1 1 0 0 NIVO+RELA 63.5000000 0
4 0.3283765 0.2894253 136 1 1 1 0 0 1 0 1 NIVO+RELA 63.5000000 0
5 0.9709276 0.8557586 136 1 0 1 0 0 0 0 1 NIVO+RELA 0.8996455 1
6 0.4929441 0.4344723 136 0 0 1 1 0 1 0 1 NIVO+RELA 63.5000000 0

5.4.7 Summary of the Weight Distribution

Now that we have generated patient-level weights and merged them with the baseline data, the next step is to evaluate the distribution of these weights. This helps ensure that the weighting process is not excessively distorting the dataset and that no individual patient is overrepresented or underrepresented to an extreme degree. The below snippet of code provides summary statistics and a histogram of the distribution of the rescaled weights. Here, the rescaled weights range from 0.3 to 2.5. The histogram also shows that there are no very large weights.

Show R Code
# Summary of weights, histogram
summary(weights_unanchored$wt.rs)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.3031  0.5903  0.9231  1.0000  1.2634  2.5483 
Show R Code
plot_anchored <- qplot(weights_unanchored$wt.rs, geom="histogram", fill = I("dark blue"),
      xlab = "Rescaled MAIC Weights",  ylab = "Count",
      binwidth=0.25)+
   scale_x_continuous(expand = c(0, 0), limits = c(0, NA)) + 
   scale_y_continuous(expand = c(0, 0), limits = c(0, NA)) +
  theme_classic()
plot_anchored 

5.4.8 Assess Weight Distribution and Effective Sample Size (ESS)

5.4.8.1 Calculating the Effective Sample Size (ESS)

After applying weights to adjust for differences between the NIVO+RELA and DAB+TRAM cohorts, we need to determine how many patients still contribute meaningfully to the final analysis. The Effective Sample Size (ESS) provides this insight.

The ESS is calculated using the following formula:

\[ ESS = \frac{\sum_{i} (\hat{w}_{i})^2}{\sum_{i} \hat{w}_{i}^2} \]

This formula accounts for how weights are distributed across patients. A lower ESS than the original sample size (136 patients) suggests that some patients received low weights, reducing their impact on the analysis.

5.4.8.2 Why ESS Matters

  • If the ESS is too low, the weighted analysis is effectively based on fewer patients, which may reduce statistical power and make the results less reliable.
  • If the ESS remains close to the original sample size (136), most patients still contribute meaningfully to the final results, ensuring robust comparisons between the two treatment groups.
Show R Code
# Effective sample size (ESS)
## ESS for NIVO+RELA
ESS_NR <- sum(weights_unanchored_$wt )^2/sum(weights_unanchored_$wt ^2)
ESS_NR
[1] 107.8514

While 107.9 represents a moderate reduction from the original 136 patients but remains reasonable. This means that, even after reweighting, a sufficient number of patients are still effectively contributing to the comparison.

After weighting, all matched variables are now balanced between the cohorts. For example, the proportion of females in the NIVO+RELA group now exactly matches that of the DAB+TRAM cohort, confirming that the weighting process successfully adjusted for baseline differences.

Show R Code
weights_unanchored %>%
  mutate(wt) %>%
  summarise(gender.mean = weighted.mean(gender_female, wt))
  gender.mean
1   0.4333925
Show R Code
Agg.DT[, c("gender_female")]
[1] 0.4333925

5.4.9 Using MAIC for Relevant Outcomes

Now that we have adjusted the NIVO+RELA cohort using patient-level weights, we can compare treatment outcomes between NIVO+RELA and DAB+TRAM. Specifically, we analyze objective response rate (ORR) and the hazard ratio (HR) for overall survival (OS) before and after weighting to assess how weighting impacts treatment comparison.

5.4.9.1 Binary outcome (ORR) comparison

In this section, we estimate the ORR for NIVO+RELA, both before and after weighting, to ensure a fair comparison with DAB+TRAM.

  • Before weighting: The raw ORR reflects the observed response rate in the original NIVO+RELA cohort.
  • After weighting: The ORR is adjusted using weights that make the NIVO+RELA cohort resemble the DAB+TRAM cohort in terms of key baseline characteristics.

We use logistic regression models to estimate ORR before and after weighting and then compare it with the DAB+TRAM cohort using odds ratios (ORs).

5.4.9.1.1 Step 1: Estimating ORR for the Comparator (DAB+TRAM)

First, we extract the reported ORR for the DAB+TRAM group from the summary data and compute its standard error (SE) using a standard formula for proportions:

Show R Code
# Get DAB+TRAM binary outcome (ORR) rate and SE
ORR_DT <- Agg.DT$ORR_investigator  # Extracts reported ORR for DAB+TRAM
ORR_se_DT <- sqrt(Agg.DT$ORR_investigator * (1 - Agg.DT$ORR_investigator)/Agg.DT$n)

The standard error (SE) formula used is: \[ SE = \sqrt{\frac{p(1−p)}{n}} \] where:

  • \(p\) = the ORR.
  • \(n\) = sample size.
5.4.9.1.2 Step 2: Estimating ORR for NIVO+RELA (Before Weighting)

Before applying MAIC weights, we estimate the unadjusted ORR for NIVO+RELA using a logistic regression model.

Show R Code
# Unweighted logistic regression model
model_pre = glm(ORR_investigator ~ 1, 
                data = weights_unanchored, 
                family = binomial()) 

Why logistic regression?

  • Since ORR is a binary outcome (responded vs. not responded), logistic regression is used to model probabilities for binary outcomes.
  • The ~1 notation means we are estimating a single overall probability for response without considering additional variables (i.e., it’s an intercept-only model).
  • This model outputs log-odds, which we then convert into probabilities using the inverse logit transformation.
5.4.9.1.2.1 Extracting the Variance and Transforming the Estimate
Show R Code
# Sandwich estimator of variance
V.sw_pre <- vcov(model_pre)  # Extracts variance-covariance matrix
trt_label_pre = "(Intercept)"  # Intercept corresponds to the log-odds of ORR

# Extract log-odds and convert to probability (ORR)
mean_trt_pre = model_pre$coefficients[trt_label_pre]
v_trt_pre = V.sw_pre[trt_label_pre, trt_label_pre]

# Convert log-odds to probability (inverse logit transformation)
mean_trt_pre = exp(mean_trt_pre) / (1 + exp(mean_trt_pre))

# Compute standard error using the delta method
se_trt_pre = sqrt((v_trt_pre / ((1 / (mean_trt_pre * (1 - mean_trt_pre)))^2)))
5.4.9.1.3 Why Use the Sandwich Estimator?

The sandwich estimator corrects for bias in standard error estimation caused by weighting.

  • Without it, standard errors could be underestimated, making confidence intervals too narrow and giving a false sense of precision.
  • Why does this happen? When we apply MAIC weights, some patients contribute more than others, breaking the assumption that all observations contribute equally to the analysis.
  • Using the sandwich estimator ensures that our uncertainty (SE) is correctly adjusted for this imbalance.

5.4.9.2 Step 3: Estimating ORR for NIVO+RELA (After Weighting)

We now apply weights to adjust for baseline differences before estimating ORR.

Show R Code
# Weighted logistic regression model
model_post = svyglm(
  ORR_investigator ~ 1,  # Model estimates ORR as a single proportion
  design = svydesign(    
    id = ~1,  # No clustering (each row is an independent patient)
    weights = ~weights_unanchored[["wt"]],  # Uses MAIC weights
    data = weights_unanchored),  
  family = binomial(),  # Logistic regression for binary outcome
  na.action = na.omit   # Removes missing values
)
5.4.9.2.0.1 Extracting the Weighted ORR Estimate
Show R Code
# Sandwich estimator of variance
V.sw_post <- vcov(model_post)
trt_label_post = "(Intercept)"

# Extract the estimated coefficient (log-odds of treatment effect) and its variance
mean_trt_post = model_post$coefficients[trt_label_post]
v_trt_post = V.sw_post[trt_label_post, trt_label_post]

# Convert log-odds to probability (inverse logit transformation)
mean_trt_post = exp(mean_trt_post) / (1 + exp(mean_trt_post))

# Compute standard error of the transformed estimate (using delta method as above)
se_trt_post = sqrt((v_trt_post / ((1 / (mean_trt_post * (1 - mean_trt_post)))^2)))

5.4.9.3 Calculating the Odds Ratio (OR) Between Treatments

Now that we have weighted and unweighted ORRs, we compare NIVO+RELA vs. DAB+TRAM using odds ratios (ORs).The following code creates a function, get_comparison_arms, that produces the odds ratio of NIVO+RELA vs. DAB+TRAM using the mean event rates and SEs calculated above. Because only the NIVO+RELA cohort is weighted in this MAIC, the rates for the DAB+TRAM cohort remain the same in both the before and after matching comparisons. The function returns a data frame with the OR, SE, confidence intervals (CI), and p-value

Show R Code
# Function to compute OR and confidence intervals
get_comparison_arms <- function (mean1, mean2, se1, se2) {
    OR <- (mean1 / (1 - mean1)) / (mean2 / (1 - mean2))  # Compute OR
    logOR <- log(OR)  # Convert to log scale
    logOR_se <- sqrt(se1^2 / mean1^2 + se1^2 / ((1 - mean1)^2) + 
                     se2^2 / mean2^2 + se2^2 / ((1 - mean2)^2))  # Compute SE of log OR
    logOR_2.5 <- logOR - qnorm(0.975) * logOR_se  # 95% CI lower bound
    logOR_97.5 <- logOR + qnorm(0.975) * logOR_se  # 95% CI upper bound
    out_2.5 <- exp(logOR_2.5)  # Convert back to OR scale
    out_97.5 <- exp(logOR_97.5)
    pvalue <- pnorm(abs(logOR / logOR_se), lower.tail = F) * 2  # Compute p-value 
                                                                ## Note: we did not use p-values in the process, 
                                                                ##       this is for example only

    output <- data.frame(OR, logOR_se, out_2.5, out_97.5, pvalue)
    colnames(output) <- c("OR", "SE", "LL", "UL", "P-value")
    return(output)
}

Running the Comparison Function

Show R Code
# OR before and after weighting
compIpdPre = get_comparison_arms(
  mean1 = mean_trt_pre,  # ORR for NIVO+RELA before weighting
  mean2 = ORR_DT,        # ORR for DAB+TRAM (reported value)
  se1 = se_trt_pre,      # Standard error of ORR for NIVO+RELA before weighting
  se2 = ORR_se_DT        # Standard error of ORR for DAB+TRAM
)

compIpdPost = get_comparison_arms(
  mean1 = mean_trt_post,  # ORR for NIVO+RELA after weighting
  mean2 = ORR_DT,         # ORR for DAB+TRAM (reported value)
  se1 = se_trt_post,      # Standard error of ORR for NIVO+RELA after weighting
  se2 = ORR_se_DT         # Standard error of ORR for DAB+TRAM
)

# Print OR results
print(compIpdPre)  # Before weighting
                   OR        SE        LL        UL      P-value
(Intercept) 0.3710323 0.1404724 0.2817354 0.4886322 1.688159e-12
Show R Code
print(compIpdPost) # After weighting
                   OR        SE        LL        UL      P-value
(Intercept) 0.3826066 0.1540082 0.2829179 0.5174215 4.423721e-10
5.4.9.3.1 Interpretation of Odds Ratio

The odds ratios (ORs) are used to compare how likely patients in each cohort are to respond to treatment.

  • OR > 1 → NIVO+RELA patients are more likely to respond than DAB+TRAM.
  • OR < 1 → NIVO+RELA patients are less likely to respond than DAB+TRAM.
  • OR = 1 → No difference in response likelihood.

The before and after weighting comparison tells us how much adjusting for baseline imbalances affects the results. If the OR shifts after weighting, it suggests that the observed differences before weighting were partially due to differences in patient characteristics, rather than a true treatment effect.

5.4.9.4 Time-to-event outcome (OS) comparison

For time-to-event outcomes, we need patient-level survival data for both cohorts:

  1. NIVO+RELA (IPD) → Simulated patient-level data.
  2. DAB+TRAM (Comparator) → Reconstructed from published Kaplan-Meier curves

The Cox proportional hazards (Cox PH) model is used to estimate the hazard ratio (HR), which compares the risk of an event (e.g., death or disease progression) over time between the two treatment groups. The analysis is performed before and after weighting, allowing us to assess how adjusting for baseline differences affects the HR.

5.4.9.4.1 Step 1: Preparing the Survival Data

We first merge the simulated NIVO+RELA dataset and reconstructed DAB+TRAM dataset into a single dataframe for analysis.

Show R Code
# Extract relevant variables from weighted IPD
tte_ipd <- weights_unanchored %>% select(ID, trt, OS_Time_Months, OS_Event, wt) 

# Combine NIVO+RELA and DAB+TRAM survival data
tte_os <- rbind(Pseudo.DT, tte_ipd)

Here:

  • OS_Time_Months → The number of months until the event (or censoring).
  • OS_Event → Binary variable (1 = event occurred, 0 = censored).
  • trt → Treatment group (NIVO+RELA vs. DAB+TRAM).
  • wt → Weighting factor (only relevant after MAIC adjustment)
5.4.9.4.2 Step 2: Cox PH Model Before Weighting

Before applying MAIC weights, we estimate the unadjusted hazard ratio (HR) to compare survival outcomes.

Show R Code
# Cox PH model comparing NIVO+RELA vs. DAB+TRAM (before weighting)
NivoRela_DabrTram_unadj <- coxph(Surv(OS_Time_Months, OS_Event) ~ trt, data = tte_os)

# Extract key results
NivoRela_DabrTram_unadj_logHR <- coef(NivoRela_DabrTram_unadj)  # Log hazard ratio
NivoRela_DabrTram_unadj_logSE <- sqrt(NivoRela_DabrTram_unadj$var[1,1])  # Log HR standard error
NivoRela_DabrTram_unadj_HR <- exp(coef(NivoRela_DabrTram_unadj))  # Convert logHR to HR
NivoRela_DabrTram_unadj_p <- summary(NivoRela_DabrTram_unadj)$coefficients[5]  # p-value

# Print results
print(NivoRela_DabrTram_unadj_HR)  # HR before weighting
trtNIVO+RELA 
   0.4849858 
Show R Code
print(NivoRela_DabrTram_unadj_p)   # P-value before weighting
[1] 3.661652e-07

Key Takeaways:

  • The Cox model estimates the HR → HR < 1 means NIVO+RELA reduces risk, while HR > 1 means DAB+TRAM has lower risk.
  • This initial estimate is unadjusted → It does not account for differences in baseline characteristics.
5.4.9.4.3 Step 3: Cox PH Model After Weighting

Now we apply MAIC weights to ensure that the weighted NIVO+RELA cohort matches the DAB+TRAM cohort in baseline characteristics. This allows for a fairer comparison.

Show R Code
# Cox PH model after weighting (accounting for baseline differences)
NivoRela_DabrTram_adj <- coxph(Surv(OS_Time_Months, OS_Event) ~ trt, data = tte_os, 
                               weights = wt) 

# Extract key results
NivoRela_DabrTram_adj_logHR <- coef(NivoRela_DabrTram_adj)  # Log hazard ratio
NivoRela_DabrTram_adj_logSE <- sqrt(NivoRela_DabrTram_adj$var[1,1])  # Log HR standard error
NivoRela_DabrTram_adj_HR <- exp(coef(NivoRela_DabrTram_adj))  # Convert logHR to HR
NivoRela_DabrTram_adj_p <- summary(NivoRela_DabrTram_adj)$coefficients[6]  # p-value

# Print results
print(NivoRela_DabrTram_adj_HR)  # HR after weighting
trtNIVO+RELA 
   0.4968978 
Show R Code
print(NivoRela_DabrTram_adj_p)   # P-value after weighting
[1] 4.384994e-06

5.4.10 Key Takeaways

Tip

Why Is This Important?

  • After weighting, the NIVO+RELA cohort should be better balanced with the DAB+TRAM cohort in terms of their patient characteristics.
  • A shift in HR after weighting indicates that some of the initial differences in survival were due to imbalances in baseline characteristics, rather than treatment effect alone.

6 Conclusion

This supplemental document provides a detailed, reproducible framework for performing MAICs using simulated and reconstructed patient-level data. We demonstrated core steps of the methodology, including:

✅ Extraction and digitization of published Kaplan-Meier curves

✅ Reconstruction of individual patient-level survival data

✅ Simulation of patient-level data when original datasets are unavailable

✅ Application of MAIC weighting to adjust for baseline imbalances

✅ Comparison of binary and time-to-event outcomes across treatment cohorts

Through this worked example, we illustrate how these techniques can support comparative effectiveness research when direct head-to-head trial data are not available. Importantly, this example incorporates simulated patient-level data for the NIVO+RELA cohort, used solely for illustrative purposes due to the unavailability of individual patient data from RELATIVITY-047.

While the data presented here do not represent real patients or actual trial results, the analytical approach, code, and workflow provide a foundation that can be adapted to other MAIC applications. This framework is generalizable to other treatment comparisons, outcomes, and therapeutic areas, offering a flexible toolset for researchers navigating the challenges of indirect comparisons.

For validated results and interpretations, please refer to our main publication. Researchers interested in applying these methods to real-world data are strongly encouraged to consult the primary literature and statistical experts to ensure rigor and accuracy.

7 References

  1. Dummer R, et al. Encorafenib plus binimetinib versus vemurafenib or encorafenib in patients with BRAF-mutant melanoma (COLUMBUS): a multicentre, open-label, randomised phase 3 trial. Lancet Oncology. 2018;19(5):603-615. doi:10.1016/S1470-2045(18)30142-6
  2. Guyot P, Ades A, Ouwens M, Welton N. Enhanced secondary analysis of survival data: reconstructing the data from published Kaplan–Meier survival curves. BMC Med Res Methodol. 2012;12:9. doi:https://doi.org/10.1186/1471-2288-12-9
  3. Phillippo D, Ades T, Dias S, Palmer S, Abrams K, Welton N. NICE DSU technical support document 18: methods for population-adjusted indirect comparisons in submissions to NICE. NICE Decision Support Unit. 2016.
  4. Signorovitch J, Ayyagari R, Cheng D, Wu E. Matching-adjusted indirect comparisons: a simulation study of statistical performance. Value Health. 2013;16(3):A48.

8 Disclosures

This supplemental methods and worked example were developed with the assistance of OpenAI to generate code and text, helping to structure explanations, annotate key concepts, and improve readability. While all statistical methodologies, data interpretations, and results remain the responsibility of the authors, AI-assisted tools were used to enhance clarity and streamline the documentation of this MAIC approach. Users should refer to the original publication for validated results and consult with statistical experts before applying these methods to real-world data.